Re: [HACKERS] logical decoding of two-phase transactions
On Mon, Dec 14, 2020 at 6:27 PM Amit Kapila wrote: > > On Tue, Dec 8, 2020 at 2:01 PM Ajin Cherian wrote: > > > > On Tue, Dec 1, 2020 at 6:26 PM Amit Kapila wrote: > > > > > To skip it, we need to send GID in begin message and then on > > > subscriber-side, check if the prepared xact already exists, if so then > > > set a flag. The flag needs to be set in begin/start_stream and reset > > > in stop_stream/commit/abort. Using the flag, we can skip the entire > > > contents of the prepared xact. In ReorderFuffer-side also, we need to > > > get and set GID in txn even when we skip it because we need to send > > > the same at commit time. In this solution, we won't be able to send it > > > during normal start_stream because by that time we won't know GID and > > > I think that won't be required. Note that this is only required when > > > we skipped sending prepare, otherwise, we just need to send > > > Commit-Prepared at commit time. > > > > I have implemented these changes and tested the fix using the test > > setup I had shared above and it seems to be working fine. > > I have also tested restarts that simulate duplicate prepares being > > sent by the publisher and verified that it is handled correctly by the > > subscriber. > > > > This implementation has a flaw in that it has used commit_lsn for the > prepare when we are sending prepare just before commit prepared. We > can't send the commit LSN with prepare because if the subscriber > crashes after prepare then it would update > replorigin_session_origin_lsn with that commit_lsn. Now, after the > restart, because we will use that LSN to start decoding, the Commit > Prepared will get skipped. To fix this, we need to remember the > prepare LSN and other information even when we skip prepare and then > use it while sending the prepare during commit prepared. > > Now, after fixing this, I discovered another issue which is that we > allow adding a new snapshot to prepared transactions via > SnapBuildDistributeNewCatalogSnapshot. We can only allow it to get > added to in-progress transactions. If you comment out the changes > added in SnapBuildDistributeNewCatalogSnapshot then you will notice > one test failure which indicates this problem. This problem was not > evident before the bug-fix in the previous paragraph because you were > using commit-lsn even for the prepare and newly added snapshot change > appears to be before the end_lsn. > > Some other assorted changes in various patches: > v31-0001-Extend-the-output-plugin-API-to-allow-decoding-o > 1. I have changed the filter_prepare API to match the signature with > FilterByOrigin. I don't see the need for ReorderBufferTxn or xid in > the API. > 2. I have expanded the documentation of 'Begin Prepare Callback' to > explain how a user can use it to detect already prepared transactions > and in which scenarios that can happen. > 3. Added a few comments in the code atop begin_prepare_cb_wrapper to > explain why we are adding this new API. > 4. Move the check whether the filter_prepare callback is defined from > filter_prepare_cb_wrapper to caller. This is similar to how > FilterByOrigin works. > 5. Fixed various whitespace and cosmetic issues. > 6. Update commit message to include two of the newly added APIs > > v31-0002-Allow-decoding-at-prepare-time-in-ReorderBuffer > 1. Changed the variable names and comments in DecodeXactOp. > 2. A new API for FilterPrepare similar to FilterByOrigin and use that > instead of ReorderBufferPrepareNeedSkip. > 3. In DecodeCommit, we need to update the reorderbuffer about the > surviving subtransactions for both ReorderBufferFinishPrepared and > ReorderBufferCommit because now both can process the transaction. > 4. Because, now we need to remember the prepare info even when we skip > it, I have simplified ReorderBufferPrepare API by removing the extra > parameters as that information will be now available via > ReorderBufferTxn. > 5. Updated comments at various places. > > v31-0006-Support-2PC-txn-pgoutput > 1. Added Asserts in streaming APIs on the subscriber-side to ensure > that we should never reach there for the already prepared transaction > case. We never need to stream the changes when we are skipping prepare > either because the snapshot was not consistent by that time or we have > already sent those changes before restart. Added the same Assert in > Begin and Commit routines because while skipping prepared txn, we must > not receive the changes of any other xact. > 2. > + /* > + * Flags are determined from the state of the transaction. We know we > + * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if > + * it's already marked as committed then it has to be COMMIT PREPARED (and > + * likewise for abort / ROLLBACK PREPARED). > + */ > + if (rbtxn_commit_prepared(txn)) > + flags = LOGICALREP_IS_COMMIT_PREPARED; > + else if (rbtxn_rollback_prepared(txn)) > + flags = LOGICALREP_IS_ROLLBACK_PREPARED; > + else > + flags = LOGICALREP_IS_PREPARE; > > I don't
Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao wrote in > > I pushed the following two patches. > > - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch > > - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch > > As I told in other thread [1], I'm thinking to revert this patch > because this change could have bad side-effect on the startup > process waiting for recovery conflict. > > Before applying the patch, the latch that the startup process > used to wait for recovery conflict was different from the latch > that SIGHUP signal handler or walreceiver process, etc set to > wake the startup process up. So SIGHUP or walreceiver didn't > wake the startup process waiting for recovery conflict up unnecessary. > > But the patch got rid of the dedicated latch for signaling > the startup process. This change forced us to use the same latch > to make the startup process wait or wake up. Which caused SIGHUP > signal handler or walreceiver proces to wake the startup process > waiting on the latch for recovery conflict up unnecessarily > frequently. > > While waiting for recovery conflict on buffer pin, deadlock needs > to be checked at least every deadlock_timeout. But that frequent > wakeups could prevent the deadlock timer from being triggered and > could delay that deadlock checks. I thought that spurious wakeups don't harm. But actually ResolveRecoveryConflictWithBufferPin doesn't consider spurious wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin before the patch comes. Currently SIGHUP and XLogFlush() (on walreceiver) also wake up startup process. For a moment I thought that ResolveRecoveryConflictWithBufferPin should wake up at shutdown time by the old recovery latch but it's not the case since it wakes up after all blockers go away. It seems to me simpler to revert the patches than making the function properly handle spurious wakeups. > Therefore, I'm thinking to revert the commit ac22929a26 and > 113d3591b8. > > [1] > https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726...@oss.nttdata.com As the result, I agree to revert them. But I think we need to add a comment for the reason we don't use MyLatch for recovery-wakeup after reverting them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Refactoring HMAC in the core code
Hi all, (Added Bruce and Daniel in CC:) $subject has been mentioned a couple of times lately, mainly by me for the recent cryptohash refactoring that has been done. We use in the core code HMAC with SHA256 for SCRAM, but this logic should go through SSL libraries able to support it (OpenSSL and libnss allow that) so as requirements like FIPS can be pushed down to any lower-level library we are building with and not Postgres. FWIW, I have also bumped into this stuff as being a requirement for the recent thread about file-level encryption in [1] where the code makes use of HMAC with SHA512. So, please find attached a patch set to rework all that. This provides a split similar to what I have done recently for cryptohash functions, with a fallback implementation located as of src/common/hmac.c, that depends itself on the fallback implementations of cryptohash functions. The OpenSSL part is done hmac_openssl.c. There are five APIs to be able to plug in HMAC implementations to create, initialize, update, finalize and free a HMAC context, in a fashion similar to cryptohashes. Regarding OpenSSL, upstream has changed lately the way it is possible to control HMACs. 3.0.0 has introduced a new set of APIs, with compatibility macros for older versions, as mentioned here: https://www.openssl.org/docs/manmaster/man3/EVP_MAC_CTX_new.html The new APIs are named EVP_MAC_CTX_new() and such. I think that this is a bit too new to use though, as we need to support OpenSSL down to 1.0.1 on HEAD and because there are compatibility macros. So instead I have decided to rely on the older interface based on HMAC_Init_ex() and co: https://www.openssl.org/docs/manmaster/man3/HMAC.html After that there is another point to note. In 1.1.0, any consumers of HMAC *have* to let OpenSSL allocate the HMAC context, like cryptohashes because the internals of the HMAC context are only known to OpenSSL. In 1.0.2 and older versions, it is possible to have access to HMAC_CTX. This requires an extra tweak in hmac_openssl.c where we need to {m,p}alloc by ourselves instead of calling HMAC_CTX_new() for 1.1.0 and 1.1.1 but some extra configure switches are able to do the trick. That means that we could use resowners only when building with OpenSSL >= 1.1.0, and not for older versions but note that the patch uses resowners anyway, as a matter of simplicity. As the changes are local to a single file, that's easy enough to follow and update. It would be easy enough to rip off this code once support for older OpenSSL versions is removed. Please note that I have added code that should be enough for the compilation on Windows, but I have not taken the time to check that. I have checked that things compiled and that check-world passes with and without OpenSSL 1.1.1 on Linux though, so I guess that I have not messed up too badly. This stuff requires much more tests, like making sure that we are able to connect to PG correctly with SCRAM when using combinations like libpq based on OpenSSL and the backend using the fallback, or just check the consistency of the results of computations with SQL functions or such. An extra thing that can be done is to clean up pgcrypto's px-hmac.c but this also requires SHA1 in cryptohash.c, something that I have submitted separately in [2]. So this could just be done later. This patch has updated the code of SCRAM so as we don't use anymore all the SCRAM/HMAC business but the generic HMAC routines instead for this work. Thoughts are welcome. I am adding that to the next CF. [1]: https://www.postgresql.org/message-id/X9lhi1ht04I+v/r...@paquier.xyz [2]: https://commitfest.postgresql.org/31/2868/ Thanks, -- Michael From 26f9375c72a512640711953b6b52f227863e9113 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 16 Dec 2020 16:15:05 +0900 Subject: [PATCH] Refactor HMAC implementations --- src/include/common/hmac.h | 33 +++ src/include/common/md5.h | 2 + src/include/common/scram-common.h | 13 -- src/include/pg_config.h.in| 6 + src/include/utils/resowner_private.h | 7 + src/backend/libpq/auth-scram.c| 61 ++--- src/backend/utils/resowner/resowner.c | 61 + src/common/Makefile | 4 +- src/common/hmac.c | 312 ++ src/common/hmac_openssl.c | 242 src/common/scram-common.c | 158 +++-- src/interfaces/libpq/fe-auth-scram.c | 70 +++--- configure | 2 +- configure.ac | 2 +- src/tools/msvc/Mkvcbuild.pm | 2 + src/tools/msvc/Solution.pm| 4 + src/tools/pgindent/typedefs.list | 1 - 17 files changed, 782 insertions(+), 198 deletions(-) create mode 100644 src/include/common/hmac.h create mode 100644 src/common/hmac.c create mode 100644 src/common/hmac_openssl.c diff --git a/src/include/common/hmac.h b/src/include/common/hmac.h new file
Re: Add Information during standby recovery conflicts
At Wed, 16 Dec 2020 14:49:25 +0900 (JST), Kyotaro Horiguchi wrote in > > > > Conflicting processes are 41171, 41194. > > > > Conflicting processes are: 41171, 41194. > > Or I came up with the following after scanning throught the tree. > > | Some processes are conflicting: 41171, 41194. Or Some processes are blocking recovery progress: 41171, 41194. ? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add Information during standby recovery conflicts
At Wed, 16 Dec 2020 12:08:31 +0900, Masahiko Sawada wrote in On Wed, Dec 16, 2020 at 11:22 AM Kyotaro Horiguchi wrote: > > At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao wrote in > > > > > > On 2020/12/15 12:04, Kyotaro Horiguchi wrote: > > > [40509:startup] DETAIL: Conflicting processes: 41171, 41194. > ... > > > I'm not sure, but it seems to me at least the period is unnecessary > > > here. > > > > Since Error Message Style Guide in the docs says "Detail and hint > > messages: > > Use complete sentences, and end each with a period.", I think that a > > period > > is necessary here. No? > > In the first place it is not a complete sentence. Might be better be > something like this if we strictly follow the style guide? FWIW I borrowed the message style in errdetail from log messages in ProcSleep(): (errmsg("process %d still waiting for %s on %s after %ld.%03d ms", MyProcPid, modename, buf.data, msecs, usecs), (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.", "Processes holding the lock: %s. Wait queue: %s.", lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; I was guessing that was the case. > > Conflicting processes are 41171, 41194. > > Conflicting processes are: 41171, 41194. Or I came up with the following after scanning throught the tree. | Some processes are conflicting: 41171, 41194. If we use the above message we might want to change other similar messages I exemplified as well. I'm not sure what should we do for other anomalies. Other errdetails of this category (incomplete sentences or the absence of a period) I found are: -- period is absent pgarch.c", auth.c
Re: Add Information during standby recovery conflicts
On Wed, Dec 16, 2020 at 11:22 AM Kyotaro Horiguchi wrote: > > At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao > wrote in > > > > > > On 2020/12/15 12:04, Kyotaro Horiguchi wrote: > > > [40509:startup] DETAIL: Conflicting processes: 41171, 41194. > ... > > > I'm not sure, but it seems to me at least the period is unnecessary > > > here. > > > > Since Error Message Style Guide in the docs says "Detail and hint > > messages: > > Use complete sentences, and end each with a period.", I think that a > > period > > is necessary here. No? > > In the first place it is not a complete sentence. Might be better be > something like this if we strictly follow the style guide? FWIW I borrowed the message style in errdetail from log messages in ProcSleep(): (errmsg("process %d still waiting for %s on %s after %ld.%03d ms", MyProcPid, modename, buf.data, msecs, usecs), (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.", "Processes holding the lock: %s. Wait queue: %s.", lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; > > Conflicting processes are 41171, 41194. > > Conflicting processes are: 41171, 41194. If we use the above message we might want to change other similar messages I exemplified as well. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: Add Information during standby recovery conflicts
On Mon, Dec 14, 2020 at 9:31 PM Fujii Masao wrote: > > > > On 2020/12/05 12:38, Masahiko Sawada wrote: > > On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand > > wrote: > >> > >> Hi, > >> > >> On 12/4/20 2:21 AM, Fujii Masao wrote: > >>> > >>> On 2020/12/04 9:28, Masahiko Sawada wrote: > On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao > wrote: > > > > > > > > On 2020/12/01 17:29, Drouvot, Bertrand wrote: > >> Hi, > >> > >> On 12/1/20 12:35 AM, Masahiko Sawada wrote: > >>> CAUTION: This email originated from outside of the organization. > >>> Do not click links or open attachments unless you can confirm the > >>> sender and know the content is safe. > >>> > >>> > >>> > >>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera > >>> wrote: > On 2020-Dec-01, Fujii Masao wrote: > > > + if (proc) > > + { > > + if (nprocs == 0) > > + appendStringInfo(&buf, "%d", proc->pid); > > + else > > + appendStringInfo(&buf, ", %d", proc->pid); > > + > > + nprocs++; > > > > What happens if all the backends in wait_list have gone? In > > other words, > > how should we handle the case where nprocs == 0 (i.e., nprocs > > has not been > > incrmented at all)? This would very rarely happen, but can happen. > > In this case, since buf.data is empty, at least there seems no > > need to log > > the list of conflicting processes in detail message. > Yes, I noticed this too; this can be simplified by changing the > condition in the ereport() call to be "nprocs > 0" (rather than > wait_list being null), otherwise not print the errdetail. (You > could > test buf.data or buf.len instead, but that seems uglier to me.) > >>> +1 > >>> > >>> Maybe we can also improve the comment of this function from: > >>> > >>> + * This function also reports the details about the conflicting > >>> + * process ids if *wait_list is not NULL. > >>> > >>> to " This function also reports the details about the conflicting > >>> process ids if exist" or something. > >>> > >> Thank you all for the review/remarks. > >> > >> They have been addressed in the new attached patch version. > > > > Thanks for updating the patch! I read through the patch again > > and applied the following chages to it. Attached is the updated > > version of the patch. Could you review this version? If there is > > no issue in it, I'm thinking to commit this version. > > Thank you for updating the patch! I have one question. > > > > > + timeouts[cnt].id = STANDBY_TIMEOUT; > > + timeouts[cnt].type = TMPARAM_AFTER; > > + timeouts[cnt].delay_ms = DeadlockTimeout; > > > > Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here? > > I changed the code that way. > > As the comment of ResolveRecoveryConflictWithLock() says the > following, a deadlock is detected by the ordinary backend process: > > * Deadlocks involving the Startup process and an ordinary backend > proces > * will be detected by the deadlock detector within the ordinary > backend. > > If we use STANDBY_DEADLOCK_TIMEOUT, > SendRecoveryConflictWithBufferPin() will be called after > DeadlockTimeout passed, but I think it's not necessary for the startup > process in this case. > >>> > >>> Thanks for pointing this! You are right. > >>> > >>> > If we want to just wake up the startup process > maybe we can use STANDBY_TIMEOUT here? > >>> > >> Thanks for the patch updates! Except what we are still discussing below, > >> it looks good to me. > >> > >>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer > >>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there? > >> > >> Agree > >> > >>> > >>> Or, first of all, we don't need to enable the deadlock timer at all? > >>> Since what we'd like to do is to wake up after deadlock_timeout > >>> passes, we can do that by changing ProcWaitForSignal() so that it can > >>> accept the timeout and giving the deadlock_timeout to it. If we do > >>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from > >>> ResolveRecoveryConflictWithLock(). Thought? > > > > Where do we enable deadlock timeout in hot standby case? You meant to > > enable it in ProcWaitForSignal() or where we set a timer for not hot > > standby case, in ProcSleep()? > > No, what I tried to say is to change ResolveRecoveryConflictWithLock() so > that it does > > 1. calculate the "minimum" timeout from deadlock_timeout and > max_standby_xxx_delay > 2. give the
Re: xid wraparound danger due to INDEX_CLEANUP false
On Sat, Nov 21, 2020 at 8:03 AM Peter Geoghegan wrote: > > On Fri, Nov 20, 2020 at 2:17 PM Robert Haas wrote: > > > Does that make sense? > > > > I *think* so. For me the point is that the index never has a right to > > insist on being vacuumed, but it can offer an opinion on how helpful > > it would be. > > Right, that might be the single most important point. It's a somewhat > more bottom-up direction for VACUUM, that is still fundamentally > top-down. Because that's still necessary. > > Opportunistic heap pruning is usually very effective, so today we > realistically have these 4 byte line pointers accumulating in heap > pages. The corresponding "bloatum" in index pages is an index tuple + > line pointer (at least 16 bytes + 4 bytes). Meaning that we accumulate > that *at least* 20 bytes for each 4 bytes in the table. And, indexes > care about *where* items go, making the problem even worse. So in the > absence of index tuple LP_DEAD setting/deletion (or bottom-up index > deletion in Postgres 14), the problem in indexes is probably at least > 5x worse. > > The precise extent to which this is true will vary. It's a mistake to > try to reason about it at a high level, because there is just too much > variation for that approach to work. We should just give index access > methods *some* say. Sometimes this allows index vacuuming to be very > lazy, other times it allows index vacuuming to be very eager. Often > this variation exists among indexes on the same table. > > Of course, vacuumlazy.c is still responsible for not letting the > accumulation of LP_DEAD heap line pointers get out of hand (without > allowing index TIDs to point to the wrong thing due to dangerous TID > recycling issues/bugs). The accumulation of LP_DEAD heap line pointers > will often take a very long time to get out of hand. But when it does > finally get out of hand, index access methods don't get to veto being > vacuumed. Because this isn't actually about their needs anymore. > > Actually, the index access methods never truly veto anything. They > merely give some abstract signal about how urgent it is to them (like > the 0.0 - 1.0 thing). This difference actually matters. One index > among many on a table saying "meh, I guess I could benefit from some > index vacuuming if it's no real trouble to you vacuumlazy.c" rather > than saying "it's absolutely unnecessary, don't waste CPU cycles > vacuumlazy.c" may actually shift how vacuumlazy.c processes the heap > (at least occasionally). Maybe the high level VACUUM operation decides > that it is worth taking care of everything all at once -- if all the > indexes together either say "meh" or "now would be a good time", and > vacuumlazy.c then notices that the accumulation of LP_DEAD line > pointers is *also* becoming a problem (it's also a "meh" situation), > then it can be *more* ambitious. It can do a traditional VACUUM early. > Which might still make sense. > > This also means that vacuumlazy.c would ideally think about this as an > optimization problem. It may be lazy or eager for the whole table, > just as it may be lazy or eager for individual indexes. (Though the > eagerness/laziness dynamic is probably much more noticeable with > indexes in practice.) > I discussed this topic off-list with Peter Geoghegan. And we think that we can lead this fix to future improvement. Let me summarize the proposals. The first proposal is the fix of this inappropriate behavior discussed on this thread. We pass a new flag in calling bulkdelete(), indicating whether or not the index can safely skip this bulkdelete() call. This is equivalent to whether or not lazy vacuum will do the heap clean (making LP_DEAD LP_UNUSED in lazy_vacuum_heap()). If it's true (meaning to do heap clean), since dead tuples referenced by index tuples will be physically removed, index AM would have to delete the index tuples. If it's false, we call to bulkdelete() with this flag so that index AM can safely skip bulkdelete(). Of course index AM also can dare not to skip it because of its personal reason. Index AM including BRIN that doesn't store heap TID can decide whether or not to do regardless of this flag. The next proposal upon the above proposal is to add a new index AM API, say ambulkdeletestrategy(), which is called before bulkdelete() for each index and asks the index bulk-deletion strategy. In this API, lazy vacuum asks, "Hey index X, I collected garbage heap tuples during heap scanning, how urgent is vacuuming for you?", and the index answers either "it's urgent" when it wants to do bulk-deletion or "it's not urgent, I can skip it". The point of this proposal is to isolate heap vacuum and index vacuum for each index so that we can employ different strategies for each index. Lazy vacuum can decide whether or not to do heap clean based on the answers from the indexes. Lazy vacuum can set the flag I proposed above according to the decision. If all indexes answer 'yes' (meaning it will do bulkdelete()), lazy vacuum
Re: Misleading comment in prologue of ReorderBufferQueueMessage
On Tue, Dec 15, 2020 at 11:25 AM Ashutosh Bapat wrote: > > On Mon, Dec 14, 2020 at 3:14 PM Amit Kapila wrote: >> >> On Mon, Dec 14, 2020 at 2:45 PM Ashutosh Bapat >> wrote: >> > >> > The name of the function suggests that the given message will be queued in >> > ReorderBuffer. The prologue of the function says so too >> > 776 /* >> > 777 * Queue message into a transaction so it can be processed upon >> > commit. >> > 778 */ >> > It led me to think that a non-transactional message is processed along >> > with the surrounding transaction, esp. when it has an associated xid. >> > >> > But in reality, the function queues only a transactional message and >> > decoders a non-transactional message immediately without waiting for a >> > commit. >> > >> > We should modify the prologue to say >> > "Queue a transactional message into a transaction so that it can be >> > processed upon commit. A non-transactional message is processed >> > immediately." and also change the name of the function to >> > ReorderBufferProcessMessage(), but the later may break API compatibility. >> > >> >> +1 for the comment change but I am not sure if it is a good idea to >> change the API name. >> > Can you please review wording? I will create a patch with updated wording. > How about something like below: A transactional message is queued to be processed upon commit and a non-transactional message gets processed immediately. OR A transactional message is queued so it can be processed upon commit and a non-transactional message gets processed immediately. -- With Regards, Amit Kapila.
Re: Add Information during standby recovery conflicts
At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao wrote in > > > On 2020/12/15 12:04, Kyotaro Horiguchi wrote: > > [40509:startup] DETAIL: Conflicting processes: 41171, 41194. ... > > I'm not sure, but it seems to me at least the period is unnecessary > > here. > > Since Error Message Style Guide in the docs says "Detail and hint > messages: > Use complete sentences, and end each with a period.", I think that a > period > is necessary here. No? In the first place it is not a complete sentence. Might be better be something like this if we strictly follow the style guide? > Conflicting processes are 41171, 41194. > Conflicting processes are: 41171, 41194. > > > + boolmaybe_log_conflict = > > + (standbyWaitStart != 0 && !logged_recovery_conflict); > > odd indentation. > > This is the result of pgindent run. I'm not sure why pgindent indents > that way, but for now I'd like to follow pgindent. Agreed. > > + /* Also, set the timer if necessary */ > > + if (logging_timer) > > + { > > + timeouts[cnt].id = STANDBY_LOCK_TIMEOUT; > > + timeouts[cnt].type = TMPARAM_AFTER; > > + timeouts[cnt].delay_ms = DeadlockTimeout; > > + cnt++; > > + } > > This doesn't consider spurious wakeup. I'm not sure it actually > > happenes but we usually consider that. That is since before this > > patch, but ProcWaitForSignal()'s comment says that: > > > >> * As this uses the generic process latch the caller has to be robust > >> * against > >> * unrelated wakeups: Always check that the desired state has occurred, > >> * and > >> * wait again if not. > > If we don't care of spurious wakeups, we should add such a comment. > > If ProcWaitForSignal() wakes up because of the reason (e.g., SIGHUP) > other than deadlock_timeout, ProcSleep() will call > ResolveRecoveryConflictWithLock() and we sleep on ProcWaitForSignal() > again since the recovery conflict has not been resolved yet. So we can > say that we consider "spurious wakeups"? So, the following seems to be spurious wakeups.. > However when I read the related code again, I found another issue in > the patch. In ResolveRecoveryConflictWithLock(), if SIGHUP causes us > to > wake up out of ProcWaitForSignal() before deadlock_timeout is reached, > we will use deadlock_timeout again when sleeping on > ProcWaitForSignal(). > Instead, probably we should use the "deadlock_timeout - elapsed time" > so that we can emit a log message as soon as deadlock_timeout passes > since starting waiting on recovery conflict. Otherwise it may take at > most > "deadlock_timeout * 2" to log "still waiting ..." message. > > To fix this issue, "deadlock_timeout - elapsed time" needs to be used > as > the timeout when enabling the timer at least in > ResolveRecoveryConflictWithLock() and > ResolveRecoveryConflictWithBufferPin(). > Also similar change needs to be applied to > ResolveRecoveryConflictWithVirtualXIDs(). > > BTW, without applying the patch, *originally* > ResolveRecoveryConflictWithBufferPin() seems to have this issue. > It enables deadlock_timeout timer so as to request for hot-standbfy > backends to check themselves for deadlocks. But if we wake up out of > ProcWaitForSignal() before deadlock_timeout is reached, the subsequent > call to ResolveRecoveryConflictWithBufferPin() also uses > deadlock_timeout > again instead of "deadlock_timeout - elapsed time". So the request for > deadlock check can be delayed. Furthermore, > if ResolveRecoveryConflictWithBufferPin() always wakes up out of > ProcWaitForSignal() before deadlock_timeout is reached, the request > for deadlock check may also never happen infinitely. > > Maybe we should fix the original issue at first separately from the > patch. Yeah, it's not an issue of this patch. > > + boolmaybe_log_conflict; > > + boolmaybe_update_title; > > Although it should be a matter of taste and I understand that the > > "maybe" means that "that logging and changing of ps display may not > > happen in this iteration" , that variables seem expressing > > respectively "we should write log if the timeout for recovery conflict > > expires" and "we should update title if 500ms elapsed". So they seem > > *to me* better be just "log_conflict" and "update_title". > > I feel the same with "maybe_log_conflict" in ProcSleep(). > > I have no strong opinion about those names. So if other people also > think so, I'm ok to rename them. Shorter is better as far as it makes sense and not-too abbreviated. > > + for recovery conflicts. This is useful in determining if recovery > > +conflicts prevent the recovery from applying WAL. > > (I'm not confident on this) Isn't the sentence better be in past or > > present continuous tense? > > Could you tell me why you think that's better? To make sure, I mention
Re: archive status ".ready" files may be created too early
At Tue, 15 Dec 2020 19:32:57 +0900 (JST), Kyotaro Horiguchi wrote in > At Mon, 14 Dec 2020 18:25:23 +, "Bossart, Nathan" > wrote in > > I wonder if these are safe assumptions to make. For your example, if > > we've written record B to the WAL buffers, but neither record A nor B > > have been written to disk or flushed, aren't we still in trouble? > > You're right in that regard. There's a window where partial record is > written when write location passes F0 after insertion location passes > F1. However, remembering all spanning records seems overkilling to me. > > I modifed the previous patch so that it remembers the start LSN of the > *oldest* corss-segment continuation record in the last consecutive > bonded segments, and the end LSN of the latest cross-segmetn > continuation record. This doesn't foreget past segment boundaries. > The region is cleard when WAL-write LSN goes beyond the remembered end > LSN. So the region may contain several wal-segments that are not > connected to the next one, but that doesn't matter so much. Mmm. Even tough it'a PoC, it was too bogus. I fixed it to work saner way. - Record the beginning LSN of the first cross-seg record and the end LSN of the last cross-seg recrod in a consecutive segments bonded by cross-seg recrods. Spcifically X and Y below. X Z Y [recA] [recB] [recC] [seg A] [seg B] [seg C] [seg D] [seg E] (1)(2.2)(2.2) (2.1) (2.1) (1) 1. If we wrote upto before X or beyond Y at a segment boundary, notify the finished segment immediately. 1.1. If we have written beyond Y, clear the recorded region. 2. Otherwise we don't notify the segment immediately: 2.1. If write request was up to exactly the current segment boundary and we know the end LSN of the record there (that is, it is recC above), extend the request to the end LSN. Then notify the segment after the record is written to the end. 2.2. Otherwise (that is recA or recB), we don't know whether the last record of the last segment is ends just at the segment boundary (Z) or a record spans between segments (recB). Anyway even if there is such a record there, we don't know where it ends. As the result what we can do there is only to refrain from notifying. It doesn't matter so much since we have already inserted recC so we will soon reach recC and will notify up to seg C. There might be a case where we insert up to Y before writing up to Z, the segment-region X-Y contains non-connected segment boundary in that case. It is processed as if it is a connected segment boundary. However, like 2.2 above, It doesn't matter since we write up to Y soon. At Tue, 15 Dec 2020 19:32:57 +0900 (JST), Kyotaro Horiguchi wrote in me> I added an assertion that a record must be shorter than a wal segment me> to XLogRecordAssemble(). This guarantees the assumption to be true? me> (The condition is tentative, would need to be adjusted.) Changed the assertion more direct way. me> Also, the attached is a PoC. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 50b3b05dd0eed79cd0b97991e82090b9d569cbac Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 16 Dec 2020 10:36:14 +0900 Subject: [PATCH v4 1/2] Avoid archiving a WAL segment that continues to the next segment If the last record of a finshed segment continues to the next segment, we need to defer archiving of the segment until the record is flushed to the end. Otherwise crash recovery can overwrite the last record of a segment and history diverges between archive and pg_wal. --- src/backend/access/transam/xlog.c | 214 +++- src/backend/replication/walsender.c | 14 +- src/include/access/xlog.h | 1 + 3 files changed, 217 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8dd225c2e1..8705809160 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -723,6 +723,16 @@ typedef struct XLogCtlData */ XLogRecPtr lastFpwDisableRecPtr; + /* The last segment notified to be archived. Protected by WALWriteLock */ + XLogSegNo lastNotifiedSeg; + + /* + * Remember the oldest and newest known segment that ends with a + * continuation record. + */ + XLogRecPtr firstSegContRecStart; + XLogRecPtr lastSegContRecEnd; + slock_t info_lck; /* locks shared variables shown above */ } XLogCtlData; @@ -1160,6 +1170,9 @@ XLogInsertRecord(XLogRecData *rdata, */ if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ) { + XLogSegNo startseg; + XLogSegNo endseg; + SpinLockAcquire(&XLogCtl->info_lck); /* advance global request to include new block(s) */ if (XLogCtl->LogwrtRqst.Write < EndPos) @@ -1167,6 +1180,24 @@ XLogInsertRecord(XLogRecData *rdata, /* update local result copy while I have the chance */ LogwrtResult = XLogCtl->LogwrtResult; SpinLockRelease(&XLo
Re: pg_shmem_allocations & documentation
On Tue, Dec 15, 2020 at 10:09:35AM +0900, Michael Paquier wrote: > Both of you seem to agree about having more details about that, which > is fine by me at the end. Horiguchi-san, do you have more thoughts to > offer? Benoit's version is similar to yours, just simpler. Okay, applied this one then. Thanks Benoit and Horigushi-san. -- Michael signature.asc Description: PGP signature
Re: enable_incremental_sort changes query behavior
On 12/16/20 1:51 AM, Jaime Casanova wrote: > On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova > wrote: >> >> On 01.12.2020 03:08, James Coleman wrote: >>> On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra >>> wrote: I've pushed the 0001 part, i.e. the main fix. Not sure about the other parts (comments and moving the code back to postgres_fdw) yet. >>> I noticed the CF entry [1] got moved to the next CF; I'm thinking this >>> entry should be marked as committed since the fix for the initial bug >>> reported on this thread has been pushed. We have the parallel safety >>> issue outstanding, but there's a separate thread and patch for that, >>> so I'll make a new CF entry for that. >>> >>> I can mark it as committed, but I'm not sure how to "undo" (or if >>> that's desirable) the move to the next CF. >>> >>> Thoughts? >>> >>> James >>> >>> 1: https://commitfest.postgresql.org/30/2754/ >>> >>> >> Oops... >> I must have rushed with this one, thank you for noticing. >> I don't see how to move it back either. I think it's fine to mark it as >> Committed where it is now. >> > > BTW, I still see this one as needs review > Hmm, not sure what was the plan, but I'll mark it as committed once I push the remaining patches (for parallel-safe checks, SRFs etc.) in a couple days. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Fix generate_useful_gather_paths for parallel unsafe pathkeys
Hi, I reviewed the patch series, tweaked a couple comments, added commit messages etc. Barring objections, I'll push this in a couple days. One thing that annoys me is that it breaks ABI because it adds two new parameters to find_em_expr_usable_for_sorting_rel, but I don't think we can get around that. We could assume require_parallel_safe=true, but we still need to pass the PlannerInfo pointer. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 857e653607656d10586e2a0f1b936576513d4277 Mon Sep 17 00:00:00 2001 From: jcoleman Date: Fri, 20 Nov 2020 16:19:00 + Subject: [PATCH 1/5] Consider unsorted paths in generate_useful_gather_paths generate_useful_gather_paths used to skip unsorted paths (without any pathkeys), but that is unnecessary - the later code actually can handle such paths just fine by adding a Sort node. This is clearly a thinko, preventing construction of useful plans. Backpatch to 13, where this code was introduced (for Incremental Sort). Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=h5m8lschj5e5oxp...@mail.gmail.com --- src/backend/optimizer/path/allpaths.c | 8 src/test/regress/expected/incremental_sort.out | 13 + src/test/regress/sql/incremental_sort.sql | 4 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 84a69b064a..672a20760c 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2914,14 +2914,6 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r Path *subpath = (Path *) lfirst(lc2); GatherMergePath *path; - /* - * If the path has no ordering at all, then we can't use either - * incremental sort or rely on implicit sorting with a gather - * merge. - */ - if (subpath->pathkeys == NIL) -continue; - is_sorted = pathkeys_count_contained_in(useful_pathkeys, subpath->pathkeys, &presorted_keys); diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out index 7cf2eee7c1..51471ae92d 100644 --- a/src/test/regress/expected/incremental_sort.out +++ b/src/test/regress/expected/incremental_sort.out @@ -1468,6 +1468,19 @@ explain (costs off) select * from t union select * from t order by 1,3; -> Parallel Seq Scan on t t_1 (13 rows) +-- Full sort, not just incremental sort can be pushed below a gather merge path +-- by generate_useful_gather_paths. +explain (costs off) select distinct a,b from t; +QUERY PLAN +-- + Unique + -> Gather Merge + Workers Planned: 2 + -> Sort + Sort Key: a, b + -> Parallel Seq Scan on t +(6 rows) + drop table t; -- Sort pushdown can't go below where expressions are part of the rel target. -- In particular this is interesting for volatile expressions which have to diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql index 3ee11c394b..cb48f200ce 100644 --- a/src/test/regress/sql/incremental_sort.sql +++ b/src/test/regress/sql/incremental_sort.sql @@ -220,6 +220,10 @@ explain (costs off) select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1 set enable_hashagg to off; explain (costs off) select * from t union select * from t order by 1,3; +-- Full sort, not just incremental sort can be pushed below a gather merge path +-- by generate_useful_gather_paths. +explain (costs off) select distinct a,b from t; + drop table t; -- Sort pushdown can't go below where expressions are part of the rel target. -- 2.26.2 >From 8ed86b447c07bdce8fafe95cdfb1e68286443274 Mon Sep 17 00:00:00 2001 From: James Coleman Date: Fri, 20 Nov 2020 20:29:51 -0500 Subject: [PATCH 2/5] Check parallel safety in generate_useful_gather_paths Commit ebb7ae839d ensured we ignore pathkeys with volatile expressions when considering adding a sort below a Gather Merge. Turns out we need to care about parallel safety of the pathkeys too, otherwise we might try sorting e.g. on results of a correlated subquery (as demonstrated by a report from Luis Roberto). Initial investigation by Tom Lane, patch by James Coleman. Backpatch to 13, where the code was instroduced (as part of Incremental Sort). Reported-by: Luis Roberto Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=h5m8lschj5e5oxp...@mail.gmail.com --- src/backend/optimizer/path/allpaths.c | 13 -- src/backend/optimizer/path/equivclass.c | 9 - src/include/optimize
Re: Proposed patch for key managment
On Tue, Dec 15, 2020 at 04:02:12PM -0500, Bruce Momjian wrote: > I thought this was going to be a huge job, but once I looked at it, it > was clear exactly what you were saying. Comparing cryptohash.c and > cryptohash_openssl.c I saw exactly what you wanted, and I think I have > completed it in the attached patch. cryptohash.c implemented the hash > in C code if OpenSSL is not present --- I assume you didn't want me to > do that, but rather to split the API so it was easy to add another > implementation. Hmm. I don't think that this is right. First, this still mixes ciphers and HMAC. Second, it is only possible here to use HMAC with SHA512 but we want to be able to use SHA256 as well with SCRAM (per se the scram_HMAC_* business in scram-common.c). Third, this lacks a fallback implementation. Finally, pgcrypto is not touched, but we should be able to simplify the area around int_digest_list in px-hmac.c which lists for HMAC as available options MD5, SHA1 and all four SHA2. Please note that we have MD5 and SHA2 as choices in cryptohash.h, and I have already a patch to add SHA1 to the cryptohash set pending for review: https://commitfest.postgresql.org/31/2868/. If all that is done correctly, a lot of code can be cleaned up while making things still pluggable with various SSL libraries. -- Michael signature.asc Description: PGP signature
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Tue, Dec 15, 2020 at 09:45:17PM -0300, Alvaro Herrera wrote: > I don't like this idea too much, because adding an option causes an ABI > break. I don't think we commonly add options in backbranches, but it > has happened. The bitmask is much easier to work with in that regard. ABI flexibility is a good point here. I did not consider this point of view. Thanks! -- Michael signature.asc Description: PGP signature
Re: Reloptions for table access methods
On Tue, Dec 15, 2020 at 03:59:02PM -0800, Jeff Davis wrote: > How should that work with the existing code? Should we have separate AM > hooks for each of these interaction points, and then have the AM figure > out how to handle its options? What about the toast.* options? It seems to me that we would want a callback for options that is part of TableAmRoutine to do the loading work, no? The bigger picture is more complex than that though because all in-core AMs rely on build_reloptions() to do the work which itself depends on a et of pre-existing reloptions (all the static relOpts array). In the end, I'd like to think that we should remove the dependency between relopt build and initialization state, then switch all the different AMs to do something similar to create_reloptions_table() in dummy_index_am.c to define a new set of reloptions, except that we'd want to preload all the options at backend startup or something similar to that for all in-core AMs, for tables and indexes. > It feels like some common options would make sense to avoid too much > code duplication. Having a set of options that can be plugged in to any AMs, like a set of preset options for autovacuum or toast makes sense to have. > I am not trying to push this in a specific direction, but I don't have > a lot of good answers, so I went for the simplest thing I could think > of that would allow an extension to have its own options, even if it's > a bit hacky. I'm open to alternatives. FWIW, I agree with Simon's feeling that bypassing a sanity check does not feel like a good solution in the long term. -- Michael signature.asc Description: PGP signature
Re: enable_incremental_sort changes query behavior
On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova wrote: > > On 01.12.2020 03:08, James Coleman wrote: > > On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra > > wrote: > >> I've pushed the 0001 part, i.e. the main fix. Not sure about the other > >> parts (comments and moving the code back to postgres_fdw) yet. > > I noticed the CF entry [1] got moved to the next CF; I'm thinking this > > entry should be marked as committed since the fix for the initial bug > > reported on this thread has been pushed. We have the parallel safety > > issue outstanding, but there's a separate thread and patch for that, > > so I'll make a new CF entry for that. > > > > I can mark it as committed, but I'm not sure how to "undo" (or if > > that's desirable) the move to the next CF. > > > > Thoughts? > > > > James > > > > 1: https://commitfest.postgresql.org/30/2754/ > > > > > Oops... > I must have rushed with this one, thank you for noticing. > I don't see how to move it back either. I think it's fine to mark it as > Committed where it is now. > BTW, I still see this one as needs review -- Jaime Casanova Professional PostgreSQL: Soporte 24x7 y capacitación
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2020-Dec-12, Peter Eisentraut wrote: > On 2020-12-11 21:27, Alvaro Herrera wrote: > > By the way-- What did you think of the idea of explictly marking the > > types used for bitmasks using types bits32 and friends, instead of plain > > int, which is harder to spot? > > If we want to make it clearer, why not turn the thing into a struct, as in > the attached patch, and avoid the bit fiddling altogether. I don't like this idea too much, because adding an option causes an ABI break. I don't think we commonly add options in backbranches, but it has happened. The bitmask is much easier to work with in that regard.
Re: Proposed patch for key managment
Hi Stephen, On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote: > Yeah, looking at what's been done there seems like the right approach to > me as well, ideally we'd have one set of APIs that'll support all these > use cases (not unlike what curl does..). Ooh... This is interesting. What did curl do wrong here? It is always good to hear from such past experiences. -- Michael signature.asc Description: PGP signature
Re: Minor documentation error regarding streaming replication protocol
On Tue, Dec 15, 2020 at 12:54:51PM -0800, Andres Freund wrote: > Minor nit: I'd put this into common/string.[ch], besides > pg_clean_ascii(). Probably renaming it to pg_is_ascii(). Yeah. There is already one pg_is_ascii_string() in saslprep.c, is_all_ascii() in collationcmds.c and string_is_ascii() in pgp-pgsql.c. I don't think we need a fourth copy of the same logic, even if it is dead simple. -- Michael signature.asc Description: PGP signature
Re: REINDEX backend filtering
On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote: > Is this really a common enough operation that we need it in the main grammar? > > Having the functionality, definitely, but what if it was "just" a > function instead? So you'd do something like: > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here) > \gexec > > Or even a function that returns the REINDEX commands directly (taking > a parameter to turn on/off concurrency for example). > > That also seems like it would be easier to make flexible, and just as > easy to plug into reindexdb? Having control in the grammar to choose which index to reindex for a table is very useful when it comes to parallel reindexing, because it is no-brainer in terms of knowing which index to distribute to one job or another. In short, with this grammar you can just issue a set of REINDEX TABLE commands that we know will not conflict with each other. You cannot get that level of control with REINDEX INDEX as it may be possible that more or more commands conflict if they work on an index of the same relation because it is required to take lock also on the parent table. Of course, we could decide to implement a redistribution logic in all frontend tools that need such things, like reindexdb, but that's not something I think we should let the client decide of. A backend-side filtering is IMO much simpler, less code, and more elegant. -- Michael signature.asc Description: PGP signature
Re: copy.sgml and partitioned tables
On Thu, Dec 3, 2020 at 03:17:23PM -0600, Justin Pryzby wrote: > https://www.postgresql.org/docs/current/sql-copy.html > |. COPY FROM can be used with plain, foreign, or partitioned tables or with > views that have INSTEAD OF INSERT triggers. > |. COPY only deals with the specific table named; IT DOES NOT COPY DATA TO OR > FROM CHILD TABLES. ... > > That language in commit 854b5eb51 was never updated since partitioning was > added, so I propose this. > > I'm not sure, but maybe it should still say that "COPY TO does not copy data > to > child tables of inheritance hierarchies." I reworded it slightly, attached, and applied it back to PG 10, where we added the partition syntax. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 369342b74d..0fca6583af 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -413,10 +413,16 @@ COPY count Notes -COPY TO can only be used with plain tables, not -with views. However, you can write COPY (SELECT * FROM -viewname) TO ... -to copy the current contents of a view. +COPY TO can be used only with plain +tables, not views, and does not copy rows from child tables +or child partitions. For example, COPY table TO copies +the same rows as SELECT * FROM ONLY table. +The syntax COPY (SELECT * FROM table) TO ... can be used to +dump all of the rows in an inheritance hierarchy, partitioned table, +or view. @@ -425,16 +431,6 @@ COPY count INSTEAD OF INSERT triggers. - -COPY only deals with the specific table named; -it does not copy data to or from child tables. Thus for example -COPY table TO -shows the same data as SELECT * FROM ONLY table. But COPY -(SELECT * FROM table) TO ... -can be used to dump all of the data in an inheritance hierarchy. - - You must have select privilege on the table whose values are read by COPY TO, and
Re: Proposed patch for key managment
On Tue, Dec 15, 2020 at 11:13:12PM +, Alastair Turner wrote: > Since it's exciting stuff, I've been trying to lash together a PoC integration > with the crypto infrastructure we see at these customers. Unfortunately, in > short, the API doesn't seem to suit integration with HSM big iron, like > Thales, > Utimaco, (other options are available), or their cloudy lookalikes. > > I understand the model of wrapping the Data Encryption Key and storing the > wrapped key with the encrypted data. The thing I can't find support for in > your > patch is passing a wrapped DEK to an external system for decryption. A key > feature of these key management approaches is that the application handling > the > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back after > unwrapping it. Google have a neat illustration of their approach, which is > very > similar to others, at https://cloud.google.com/kms/docs/envelope-encryption# > how_to_decrypt_data_using_envelope_encryption > > So instead of calling out to a passphrase command which returns input for a > PBKDF, Postgres (in the form of initdb or postmaster) should be passing the > wrapped DEK and getting back the DEK in plain. The value passed from Postgres > to the external process doesn't even have to be a wrapped DEK, it could be a > key in the key->value sense, for use in a lookup against Vault, CredHub or the > Kubernetes secret store. Making the stored keys opaque to the Postgres > processes and pushing the task of turning them into cryptographic material to > an external hepler process probably wouldn't shrink the patch overall, but it > would move a lot of code from core processes into the helper. Maybe that > helper > should live in contrib, as discussed below, where it would hopefully be joined > by a bridge for KMIP and others. Well, I think we can go two ways with this. First, if you look at my attached Yubikey script, you will see that, at boot time, since $DIR/yubipass.key does not exit, the script generates a passphrase, wraps it with the Yubikey's PIV #3 public key, and stores it in the live key directory. The next time it is called, it decrypts the wrapped passphrase using the Yubikey, and uses that as the KEK to decrypt the two DEKs. Now, you could modify the script to call the HSM at boot time to retrieve a DEK wrapped in a KEK, and store it in the key directory. On all future calls, you can pass the DEK wrapped by KEK to the HSM, and use the returned DEK as the KEK/passphrase to decrypt the real DEK. The big value of this is that the keys are validated, and it uses the existing API. Ideally we write a sample script of how to this and include it in /contrib. The second approach is to make a new API for what you want. It would be similar to the cluster_passphrase_command, but it would be simple in some ways, but complex in others since we need at least two DEKs, and key rotation might be very risky since there isn't validation in the server. I don't think this can be accomplished by a contrib module, but would actually be easy to implement, but perhaps hard to document because the user API might be tricky. I think my big question is about your sentence, "A key feature of these key management approaches is that the application handling the encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back after unwrapping it." It is less secure for the HSM to return a KEK rather than a DEK? I can't see why it would be. The application never sees the KEK used to wrap the DEK it has stored in the file system, though that DEK is actually used as a passphrase by Postgres. This is the same with the Yubikey --- Postgres never sees the private key used to unlock what it locked with the Yubikey public key. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee pass_yubi_nopin.sh Description: Bourne shell script
Re: Reloptions for table access methods
On Tue, 2020-12-15 at 17:37 +, Simon Riggs wrote: > > I definitely don't agree with the premise that all existing heap > options should be common across all new or extension tableAMs. There > are dozens of such options and I don't believe that they would all > have meaning in all future storage plugins. I agree in principle, but looking at the options that are present today, a lot of them are integrated into general database features, like autovacuum, toast, logical replication, and parellel query planning. We need to have some answer about how these features should interact with a custom AM if we can't assume anything about the reloptions it understands. > I think options should just work exactly the same for Indexes and > Tables. How should that work with the existing code? Should we have separate AM hooks for each of these interaction points, and then have the AM figure out how to handle its options? What about the toast.* options? It feels like some common options would make sense to avoid too much code duplication. I am not trying to push this in a specific direction, but I don't have a lot of good answers, so I went for the simplest thing I could think of that would allow an extension to have its own options, even if it's a bit hacky. I'm open to alternatives. Regards, Jeff Davis
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Mon, Dec 14, 2020 at 06:14:17PM -0600, Justin Pryzby wrote: > On Sat, Dec 12, 2020 at 01:45:26PM -0600, Justin Pryzby wrote: > > On Sat, Dec 12, 2020 at 09:20:35AM +0100, Peter Eisentraut wrote: > > > On 2020-12-11 21:27, Alvaro Herrera wrote: > > > > By the way-- What did you think of the idea of explictly marking the > > > > types used for bitmasks using types bits32 and friends, instead of plain > > > > int, which is harder to spot? > > > > > > If we want to make it clearer, why not turn the thing into a struct, as in > > > the attached patch, and avoid the bit fiddling altogether. > > > > I like this. > > It's a lot like what I wrote as [PATCH v31 1/5] ExecReindex and > > ReindexParams > > In my v31 patch, I moved ReindexOptions to a private structure in > > indexcmds.c, > > with an "int options" bitmask which is passed to reindex_index() et al. > > Your > > patch keeps/puts ReindexOptions index.h, so it also applies to > > reindex_index, > > which I think is good. > > > > So I've rebased this branch on your patch. > > > > Some thoughts: > > > > - what about removing the REINDEXOPT_* prefix ? > > - You created local vars with initialization like "={}". But I thought it's > >needed to include at least one struct member like "={false}", or else > >they're not guaranteed to be zerod ? > > - You passed the structure across function calls. The usual convention is > > to > >pass a pointer. > > I think maybe Michael missed this message (?) > I had applied some changes on top of Peter's patch. > > I squished those commits now, and also handled ClusterOption and VacuumOption > in the same style. > > Some more thoughts: > - should the structures be named in plural ? "ReindexOptions" etc. Since > they >define *all* the options, not just a single bit. > - For vacuum, do we even need a separate structure, or should the members be >directly within VacuumParams ? It's a bit odd to write >params.options.verbose. Especially since there's also ternary options > which >are directly within params. > - Then, for cluster, I think it should be called ClusterParams, and > eventually >include the tablespaceOid, like what we're doing for Reindex. > > I am awaiting feedback on these before going further since I've done too much > rebasing with these ideas going back and forth and back. With Alexey's agreement, I propose something like this. I've merged some commits and kept separate the ones which are more likely to be disputed/amended. But it may be best to read the series as a single commit, like "git diff origin.." -- Justin >From ee702a6f49392e84bb51ec432d0974c7369b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 15 Dec 2020 17:55:21 -0600 Subject: [PATCH 1/4] Convert options to struct: Reindex/Cluster/Vacuum Remove prefixes and lowercase structure member variables; Rename to plural: Vacuum/ClusterOptions; --- src/backend/access/heap/vacuumlazy.c | 8 +- src/backend/catalog/index.c | 26 +++--- src/backend/commands/analyze.c | 15 ++-- src/backend/commands/cluster.c | 29 +++--- src/backend/commands/indexcmds.c | 100 ++--- src/backend/commands/tablecmds.c | 4 +- src/backend/commands/vacuum.c| 128 +-- src/backend/postmaster/autovacuum.c | 14 +-- src/backend/tcop/utility.c | 10 +-- src/include/catalog/index.h | 16 ++-- src/include/commands/cluster.h | 10 +-- src/include/commands/defrem.h| 9 +- src/include/commands/vacuum.h| 29 +++--- 13 files changed, 198 insertions(+), 200 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 25f2d5df1b..6bfed2b2da 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -456,7 +456,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, starttime = GetCurrentTimestamp(); } - if (params->options & VACOPT_VERBOSE) + if (params->options.verbose) elevel = INFO; else elevel = DEBUG2; @@ -484,7 +484,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, xidFullScanLimit); aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid, mxactFullScanLimit); - if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) + if (params->options.disable_page_skipping) aggressive = true; vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats)); @@ -902,7 +902,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, * be replayed on any hot standby, where it can be disruptive. */ next_unskippable_block = 0; - if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0) + if (!params->options.disable_page_skipping) { while (next_unskippable_block < nblocks) { @@ -960,7 +960,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, { /* Time to advance next_unskip
Re: Sorting case branches in outfuncs.c/outNode alphabetically
Fedir Panasenko writes: > outfuncs.c contains a switch statement responsible for choosing > serialization function per node type here: > https://github.com/postgres/postgres/blob/master/src/backend/nodes/outfuncs.c#L3711 Why are you concerned about outfuncs.c in particular? Its sibling files (copyfuncs, equalfuncs, etc) have much the same structure. > It spans over >650LOC and is quite unreadable, requiring using search > or code analysis tools for pretty much anything. But why exactly do you need to read it? It's boring boilerplate. > I'd like to sort these case branches alphabetically and I'd like to > get some input on that prior to submitting a patch. I'd be a strong -1 for alphabetical sort. To my mind, the entries here, and in other similar places, should match the order in which the struct types are declared in src/include/nodes/*nodes.h. And those are not sorted alphabetically, but (more or less) by functionality. I would *definitely* not favor a patch that arbitrarily re-orders those header files alphabetically. Now, IIRC the ordering in the backend/nodes/*.c files is not always a perfect match to the headers. I'd be good with a patch that makes them line up better. But TBH, that is just neatnik-ism; I still don't see how it makes any interesting difference to readability. Keep in mind also that various people have shown interest in auto-generating the backend/nodes/*.c files from the header declarations, in which case this discussion would be moot. > (readfuncs.c contains a similar construct for deserializing nodes, but > that one is if...else based as opposed to switch, so order there might > have performance implications -> I'd reserve that topic for separate > discussion). Yeah, I keep wondering when that structure is going to become a noticeable performance problem. There's little reason to think that we've ordered the node types by frequency there. At some point it might make sense to convert readfuncs' lookup logic into, say, a perfect hash table (cf src/tools/PerfectHash.pm). I'd certainly think that that would be a more useful activity than arguing over the switch order. regards, tom lane
Re: Proposed patch for key managment
Hi Bruce et al Firstly, thanks for shaping the patch, getting it down to a manageable scope of cluster file encryption. I think this is a great feature and it matters to a lot of the customers I talk to at VMware about adopting Postgres. Since it's exciting stuff, I've been trying to lash together a PoC integration with the crypto infrastructure we see at these customers. Unfortunately, in short, the API doesn't seem to suit integration with HSM big iron, like Thales, Utimaco, (other options are available), or their cloudy lookalikes. I understand the model of wrapping the Data Encryption Key and storing the wrapped key with the encrypted data. The thing I can't find support for in your patch is passing a wrapped DEK to an external system for decryption. A key feature of these key management approaches is that the application handling the encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back after unwrapping it. Google have a neat illustration of their approach, which is very similar to others, at https://cloud.google.com/kms/docs/envelope-encryption#how_to_decrypt_data_using_envelope_encryption So instead of calling out to a passphrase command which returns input for a PBKDF, Postgres (in the form of initdb or postmaster) should be passing the wrapped DEK and getting back the DEK in plain. The value passed from Postgres to the external process doesn't even have to be a wrapped DEK, it could be a key in the key->value sense, for use in a lookup against Vault, CredHub or the Kubernetes secret store. Making the stored keys opaque to the Postgres processes and pushing the task of turning them into cryptographic material to an external hepler process probably wouldn't shrink the patch overall, but it would move a lot of code from core processes into the helper. Maybe that helper should live in contrib, as discussed below, where it would hopefully be joined by a bridge for KMIP and others. On Tue, 15 Dec 2020 at 19:09, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote: > > > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote: > > > > There are a few shell script I should include to show how to create > > > > commands. Where should they be stored? /contrib module? > > > > > > Why are these needed. Is it a matter of documentation? > > > > I have posted some of the scripts. They involved complex shell > > scripting that I doubt the average user can do. The scripts are for > > prompting for a passphrase from the user's terminal, or using the > > Yubikey, with our without typing a pin. I can put them in the docs and > > then users can copy them into a file. Is that the preferred method? > > If we are going to include these in core anywhere, I would think a new > directory under contrib would make the most sense. I'd hope that we > could find a way to automate the testing of them though, so that we have > some idea when they break because otherwise I'd be concerned that > they'll break somewhere down the line and we won't notice for quite a > while. > > Regards Alastair
Sorting case branches in outfuncs.c/outNode alphabetically
Hi all, outfuncs.c contains a switch statement responsible for choosing serialization function per node type here: https://github.com/postgres/postgres/blob/master/src/backend/nodes/outfuncs.c#L3711 It spans over >650LOC and is quite unreadable, requiring using search or code analysis tools for pretty much anything. I'd like to sort these case branches alphabetically and I'd like to get some input on that prior to submitting a patch. Obvious benefit would be increase in readability, with the downside of somewhat messing up the git history. (readfuncs.c contains a similar construct for deserializing nodes, but that one is if...else based as opposed to switch, so order there might have performance implications -> I'd reserve that topic for separate discussion). --- Best regards, Fedir
Re: Minor documentation error regarding streaming replication protocol
On Tue, Dec 15, 2020 at 01:22:44AM -0800, Jeff Davis wrote: > On Thu, 2020-12-03 at 13:14 -0500, Bruce Momjian wrote: > > Andres objected (in a separate conversation) to forcing a binary- > > > format > > > value on a client that didn't ask for one. He suggested that we > > > mandate > > > that the data is ASCII-only (for both filename and content), > > > closing > > > the gap Michael raised[1]; and then just declare all values to be > > > text > > > format. > > > > How do we mandate that? Just mention it in the docs and C comments? > > Attached a simple patch that enforces an all-ASCII restore point name > rather than documenting the possibility of non-ASCII text. Oh, nice idea, sure. Would you like to apply it? Master only or backpatch? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: SELECT INTO deprecation
On Wed, Dec 9, 2020 at 09:48:54PM +0100, Peter Eisentraut wrote: > On 2020-12-03 20:26, Peter Eisentraut wrote: > > On 2020-12-03 16:34, Tom Lane wrote: > > > As I recall, a whole lot of the pain we have with INTO has to do > > > with the semantics we've chosen for INTO in a set-operation nest. > > > We think you can write something like > > > > > > SELECT ... INTO foo FROM ... UNION SELECT ... FROM ... > > > > > > but we insist on the INTO being in the first component SELECT. > > > I'd like to know exactly how much of that messiness is shared > > > by SQL Server. > > > > On sqlfiddle.com, this works: > > > > select a into t3 from t1 union select a from t2; > > > > but this gets an error: > > > > select a from t1 union select a into t4 from t2; > > > > SELECT INTO must be the first query in a statement containing a UNION, > > INTERSECT or EXCEPT operator. > > So, with that in mind, here is an alternative proposal that points out that > SELECT INTO has some use for compatibility. Do we really want to carry around confusing syntax for compatibility? I doubt we would ever add INTO now, even for compatibility. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: pg_rewind copies
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hello The patch seems to do as described and the regression and tap tests are passing + /* +* A local source is not expected to change while we're rewinding, so check +* that we size of the file matches our earlier expectation. +*/ Is this a tyoo? thanks Cary
Re: Proposed patch for key managment
On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote: > > > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote: > > > > I am getting close to applying these patches, probably this week. The > > > > patches are cumulative: > > > > > > > > > > > > https://github.com/postgres/postgres/compare/master...bmomjian:key.diff > > > > > > > > https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff > > > > > > I strongly object to a commit based on the current state of the patch. > > Yeah, I agree that this could use some work though I don't think it's > all that far away from being something we can get in, to allow us to > move forward with the other work around supporting TDE. Glad to hear that. Michael Paquier was right that the common/cipher*.c API was wrong and should not be committed until fixed, which I think it has been. If there are other messy parts of this patch, I would like to fix those too. > > I have posted some of the scripts. They involved complex shell > > scripting that I doubt the average user can do. The scripts are for > > prompting for a passphrase from the user's terminal, or using the > > Yubikey, with our without typing a pin. I can put them in the docs and > > then users can copy them into a file. Is that the preferred method? > > If we are going to include these in core anywhere, I would think a new > directory under contrib would make the most sense. I'd hope that we > could find a way to automate the testing of them though, so that we have > some idea when they break because otherwise I'd be concerned that > they'll break somewhere down the line and we won't notice for quite a > while. I am doing automated testing here, but I have a Yubikey. Michael Paquier recommended TAP tests, I guess like we do for pg_upgrade, and I will look into that. > This doesn't seem like something that would make sense to only have in > the documentation, which isn't a terribly convenient way to make use of > them. Yeah, it is hard to cut-paste 60 lines. The /contrib directory would be for SSL and cluster file encryption passphrase control, so it should have more general usefulness. Would those file be copied to the install directory somewhere if you install /contrib? Do we have any cases of this? > > The point is that the command-line options will be active, but will not > > be documented. It will not do anything on its own unless you specify > > that command-line option. I can comment-out the command-line options > > from being active but the code it going to look very messy. > > I'm a bit concerned with what's being contemplated here.. Ideally, > we'll actually get everything committed for v14 but if we don't and this > doesn't serve any use-case then I'm not sure that it should be > included in the release. I also don't like committing and reverting > things either, but having command line options that aren't documented > but which exist in the code isn't great either, nor is having random > code commented out.. Agreed. Once we use this code for SSL passphrase prompting, many of the options will actually have usefulness. What we could do is to not hide anything, including the docs, and then hide them once we are ready to start beta. At that point, maybe putting in the #ifdefs will be acceptable, since we would not be working on the feature until we branch, and then we can just remove them again. We had similar issues with the Win32 port, but that had fewer visible knobs. > > > I think that you should attach such patches directly to the emails > > > sent to pgsql-hackers, if those github links disappear for some > > > reason, it would become impossible to refer to see what has been > > > discussed here. > > > > Well, the patches are changing frequently. I am attaching a version to > > this email. > > I agree with having the patches posted to the list. I don't really > agree with the argument of "they change frequently". Sure, they are only 35k compressed. My point is that I am modifying my git tree all day, and with github, I can easily push the changes and when someone looks at the diff, they see the most recent version. > > I think that designing a correct set of APIs that can be plugged with > > any SSL library is the correct move in the long term. I have on my > > agenda to clean up HMAC as SCRAM uses that with SHA256 and you would > > use that with SHA512. Daniel has mentioned that he has been touching > > this area, and I also got a patch halfly done though pgcrypto needs > > some extra thoughts. So this is still WIP but you could reuse that > > here. > > Yeah, looking at what's been done there seems like the right approach to > me as well, ideally we'd have one set of APIs that'll support all these > use cases (not unlike what curl does..). I think I accomplished this in a p
Re: Proposed patch for key managment
On Tue, Dec 15, 2020 at 02:20:33PM +0900, Michael Paquier wrote: > On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote: > > I am going to need someone to help me make these changes. I don't feel > > I know enough about the crypto API to do it, and it will take me 1+ week > > to learn it. > > I think that designing a correct set of APIs that can be plugged with > any SSL library is the correct move in the long term. I have on my > agenda to clean up HMAC as SCRAM uses that with SHA256 and you would > use that with SHA512. Daniel has mentioned that he has been touching > this area, and I also got a patch halfly done though pgcrypto needs > some extra thoughts. So this is still WIP but you could reuse that > here. I thought this was going to be a huge job, but once I looked at it, it was clear exactly what you were saying. Comparing cryptohash.c and cryptohash_openssl.c I saw exactly what you wanted, and I think I have completed it in the attached patch. cryptohash.c implemented the hash in C code if OpenSSL is not present --- I assume you didn't want me to do that, but rather to split the API so it was easy to add another implementation. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee key-alter.diff.gz Description: application/gzip
Re: Minor documentation error regarding streaming replication protocol
Hi, On 2020-12-15 01:22:44 -0800, Jeff Davis wrote: > Attached a simple patch that enforces an all-ASCII restore point name > rather than documenting the possibility of non-ASCII text. +1 > diff --git a/src/backend/access/transam/xlogfuncs.c > b/src/backend/access/transam/xlogfuncs.c > index 290658b22c..48daed56f6 100644 > --- a/src/backend/access/transam/xlogfuncs.c > +++ b/src/backend/access/transam/xlogfuncs.c > @@ -44,6 +44,8 @@ > static StringInfo label_file; > static StringInfo tblspc_map_file; > > +static bool is_all_ascii(const char *str); Minor nit: I'd put this into common/string.[ch], besides pg_clean_ascii(). Probably renaming it to pg_is_ascii(). Greetings, Andres Freund
Re: Rethinking plpgsql's assignment implementation
I realized that the speedup patch I posted yesterday is flawed: it's too aggressive about applying the R/W param mechanism, instead of not aggressive enough. To review, the point of that logic is that if we have an assignment like arrayvar := array_append(arrayvar, some-scalar-expression); a naive implementation would have array_append construct an entire new array, which we'd then have to copy into plpgsql's variable storage. Instead, if the array variable is in expanded-array format (which plpgsql encourages it to be) then we can pass the array parameter as a "read/write expanded datum", which array_append recognizes as license to scribble right on its input and return the modified input; that takes only O(1) time not O(N). Then plpgsql's assignment code notices that the expression result datum is the same pointer already stored in the variable, so it does nothing. With the patch at hand, a subscripted assignment a[i] := x becomes, essentially, a := subscriptingref(a, i, x); and we need to make the same sort of transformation to allow array_set_element to scribble right on the original value of "a" instead of making a copy. However, we can't simply not consider the source expression "x", as I proposed yesterday. For example, if we have a := subscriptingref(a, i, f(array_append(a, x))); it's not okay for array_append() to scribble on "a". The R/W param mechanism normally disallows any additional references to the target variable, which would prevent this error, but I broke that safety check with the 0007 patch. After thinking about this awhile, I decided that plpgsql's R/W param mechanism is really misdesigned. Instead of requiring the assignment source expression to be such that *all* its references to the target variable could be passed as R/W, we really want to identify *one* reference to the target variable to be passed as R/W, allowing any other ones to be passed read/only as they would be by default. As long as the R/W reference is a direct argument to the top-level (hence last to be executed) function in the expression, there is no harm in R/O references being passed to other lower parts of the expression. Nor is there any use-case for more than one argument of the top-level function being R/W. So the attached rewrite of the 0007 patch reimplements that logic to identify one single Param that references the target variable, and make only that Param pass a read/write reference, not any other Params referencing the target variable. This is a good change even without considering the assignment-reimplementation proposal, because even before this patchset we could have cases like arrayvar := array_append(arrayvar, arrayvar[i]); The existing code would be afraid to optimize this, but it's in fact safe. I also re-attach the 0001-0006 patches, which have not changed, just to keep the cfbot happy. regards, tom lane diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 1fa9f19f08..9e7f1590b5 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -12095,7 +12095,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, * parse_analyze() or the rewriter, but instead we need to pass them * through parse_utilcmd.c to make them ready for execution. */ - raw_parsetree_list = raw_parser(cmd); + raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT); querytree_list = NIL; foreach(list_item, raw_parsetree_list) { diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 8f341ac006..88c76dd985 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -723,6 +723,15 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); */ %token NOT_LA NULLS_LA WITH_LA +/* + * The grammar likewise thinks these tokens are keywords, but they are never + * generated by the scanner. Rather, they can be injected by parser.c as + * the initial token of the string (using the lookahead-token mechanism + * implemented there). This provides a way to tell the grammar to parse + * something other than the usual list of SQL commands. + */ +%token MODE_TYPE_NAME + /* Precedence: lowest to highest */ %nonassoc SET/* see relation_expr_opt_alias */ @@ -787,11 +796,18 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); /* * The target production for the whole parse. + * + * Ordinarily we parse a list of statements, but if we see one of the + * special MODE_XXX symbols as first token, we parse something else. */ stmtblock: stmtmulti { pg_yyget_extra(yyscanner)->parsetree = $1; } + | MODE_TYPE_NAME Typename + { +pg_yyget_extra(yyscanner)->parsetree = list_make1($2); + } ; /* diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c index 2709f6f9c7..633774b733 100644 --- a/src/backend/parser/parse_type.c +++ b/src/backend
Re: SQL/JSON: functions
út 15. 12. 2020 v 19:56 odesílatel Oleg Bartunov napsal: > On Tue, Dec 15, 2020 at 8:37 PM Pavel Stehule > wrote: > > > > > > > > út 15. 12. 2020 v 18:00 odesílatel Simon Riggs > napsal: > >> > >> On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov > wrote: > >> > > >> > Attached 50th version of the patches. Only the documentation was > changed > >> > since the previous version. > >> > >> I can imagine the effort required to get to v50, so I salute your > efforts. > >> > >> The document for SQL Standard has now been published as CD > >> 9075-2-Foundation (Thanks Peter). > >> > >> That gives us a clearer picture of what is being voted on and should > >> allow Nikita to complete his work. > >> > >> I suggest we move forwards on this now, but if anyone objects to > >> including this in PG14 in favour of waiting for the vote, please say > >> so clearly so we can skip to PG15. > > > > > > Maybe this set of patches can be reorganized and divided. Some parts > like json generating functions are almost trivial and without > controversions with clean benefits for users. > > I agree with this, most interesting is JSON_TABLE. > It is very interesting, but it is very complex too. There is not any similarly complex function in ANSI SQL. This function defines its own language. > > > The most complexity is related to json_table function. Nikita did a very > good job and implemented this function in maximal conformance with ANSI SQL > with a maximal set of features. On second hand it is hard to do review > because this patch is really complex, and a lot of functionality was not > implemented elsewhere (so isn't possible to compare results). I think it > should be possible to reduce complexity and divide acceptance of json_table > to some steps like basic functionality (on MySQL level), enhanced > functionality (on Oracle level), and full functionality (the Postgres will > be first). This functionality is interesting and maximal conformity with > SQL/JSON is great so I am for merging it. But if it will be divided into > some chronological steps, then there can be higher probability of merging > to upstream in a good time. > > I think it's shame to look up on MySQL and Oracle, since we have much > better and complete implementation of the Standard. > Maybe I used bad words. I would not reduce json_table patch to MySQL or Oracle level. I proposed merging this patch in a few steps when any step can be functional. Standard divides JSON supports to some levels too. There is about 50% code (and features) when review is not a problem, because this functionality is very clean and natural. Another 50% can be possibly problematic because Nikita implementation is first in the world and the description in standard is complex and hard to read, and hard to test. Because these patches are not divided we have to do lot of repeated work in every cycle. I proposed to do work more in style step by step than in big bang style. I think there is a lot of code that can be commitable immediately (maybe half). This can be done quickly without any controversies. This reduces complexity for 50% so we can concentrate better on the rest of patches. The final target is full support of standard and full merge of Nikita's patches. Nikita did hard and good work and it is nonsense to throw away any part of his work - and it is a pity so the merging process is too long already . But I understand it is pretty hard to commit to this patch in complexity like this patch has. > > > > > > > Regards > > > > Pavel > > > >> > >> -- > >> Simon Riggshttp://www.EnterpriseDB.com/ > > > > -- > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Re: SQL/JSON: functions
On Tue, Dec 15, 2020 at 8:37 PM Pavel Stehule wrote: > > > > út 15. 12. 2020 v 18:00 odesílatel Simon Riggs napsal: >> >> On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov wrote: >> > >> > Attached 50th version of the patches. Only the documentation was changed >> > since the previous version. >> >> I can imagine the effort required to get to v50, so I salute your efforts. >> >> The document for SQL Standard has now been published as CD >> 9075-2-Foundation (Thanks Peter). >> >> That gives us a clearer picture of what is being voted on and should >> allow Nikita to complete his work. >> >> I suggest we move forwards on this now, but if anyone objects to >> including this in PG14 in favour of waiting for the vote, please say >> so clearly so we can skip to PG15. > > > Maybe this set of patches can be reorganized and divided. Some parts like > json generating functions are almost trivial and without controversions with > clean benefits for users. I agree with this, most interesting is JSON_TABLE. > > The most complexity is related to json_table function. Nikita did a very good > job and implemented this function in maximal conformance with ANSI SQL with a > maximal set of features. On second hand it is hard to do review because this > patch is really complex, and a lot of functionality was not implemented > elsewhere (so isn't possible to compare results). I think it should be > possible to reduce complexity and divide acceptance of json_table to some > steps like basic functionality (on MySQL level), enhanced functionality (on > Oracle level), and full functionality (the Postgres will be first). This > functionality is interesting and maximal conformity with SQL/JSON is great so > I am for merging it. But if it will be divided into some chronological steps, > then there can be higher probability of merging to upstream in a good time. I think it's shame to look up on MySQL and Oracle, since we have much better and complete implementation of the Standard. > > Regards > > Pavel > >> >> -- >> Simon Riggshttp://www.EnterpriseDB.com/ -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: SQL/JSON: functions
On Tue, Dec 15, 2020 at 8:01 PM Simon Riggs wrote: > > On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov wrote: > > > > Attached 50th version of the patches. Only the documentation was changed > > since the previous version. > > I can imagine the effort required to get to v50, so I salute your efforts. > > The document for SQL Standard has now been published as CD > 9075-2-Foundation (Thanks Peter). SQL-2016 is already 4 years old and this what we are using for our development. What is CD 9075-2-Foundation ? > > That gives us a clearer picture of what is being voted on and should > allow Nikita to complete his work. > > I suggest we move forwards on this now, but if anyone objects to > including this in PG14 in favour of waiting for the vote, please say > so clearly so we can skip to PG15. There is upcoming JSON v2. SQL Standard, which specifies JSON data type, that mean we have to protect our users, or SQL-compliant applications will be slow and developers will be disappointed. I discussed all these in my talk at Postgres Build http://www.sai.msu.su/~megera/postgres/talks/json-build-2020.pdf Andrew Dunstan has volunteered to work with us on completing SQL/JSON standard, but ... I think we need SQL/JSON functions in PG14 and SQL/JSON compatibility mode for json/jsonb. > > -- > Simon Riggshttp://www.EnterpriseDB.com/ -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Proposed patch for key managment
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote: > > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote: > > > I am getting close to applying these patches, probably this week. The > > > patches are cumulative: > > > > > > https://github.com/postgres/postgres/compare/master...bmomjian:key.diff > > > > > > https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff > > > > I strongly object to a commit based on the current state of the patch. Yeah, I agree that this could use some work though I don't think it's all that far away from being something we can get in, to allow us to move forward with the other work around supporting TDE. > > > There are a few shell script I should include to show how to create > > > commands. Where should they be stored? /contrib module? > > > > Why are these needed. Is it a matter of documentation? > > I have posted some of the scripts. They involved complex shell > scripting that I doubt the average user can do. The scripts are for > prompting for a passphrase from the user's terminal, or using the > Yubikey, with our without typing a pin. I can put them in the docs and > then users can copy them into a file. Is that the preferred method? If we are going to include these in core anywhere, I would think a new directory under contrib would make the most sense. I'd hope that we could find a way to automate the testing of them though, so that we have some idea when they break because otherwise I'd be concerned that they'll break somewhere down the line and we won't notice for quite a while. This doesn't seem like something that would make sense to only have in the documentation, which isn't a terribly convenient way to make use of them. > > > Are people okay with having the feature enabled, but invisible > > > since the docs and --help output are missing? When we enable > > > ssl_passphrase_command to prompt from the terminal, some of the > > > command-line options will be useful. > > > > You are suggesting to enable the feature by default, make it invisible > > to the users and leave it undocumented? Is there something I missing > > here? > > The point is that the command-line options will be active, but will not > be documented. It will not do anything on its own unless you specify > that command-line option. I can comment-out the command-line options > from being active but the code it going to look very messy. I'm a bit concerned with what's being contemplated here.. Ideally, we'll actually get everything committed for v14 but if we don't and this doesn't serve any use-case then I'm not sure that it should be included in the release. I also don't like committing and reverting things either, but having command line options that aren't documented but which exist in the code isn't great either, nor is having random code commented out.. > > > Do people like the command-letter choices? > > > > > > I called the alter passphrase utility pg_altercpass. I could > > > have called it pg_clusterpass, but I wanted to highlight it is > > > only for changing the passphrase, not for creating them. > > > > I think that you should attach such patches directly to the emails > > sent to pgsql-hackers, if those github links disappear for some > > reason, it would become impossible to refer to see what has been > > discussed here. > > Well, the patches are changing frequently. I am attaching a version to > this email. I agree with having the patches posted to the list. I don't really agree with the argument of "they change frequently". * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote: > > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote: > >> - The cipher split also should be done in its own patch, and reviewed > >> on its own. There is a mix of dependencies between non-OpenSSL and > >> OpenSSL code paths, making the pluggin of a new SSL library harder to > >> do. In short, this requires proper API design, which is not something > >> we have here. There are also no changes in pgcrypto for that stuff. > > > > I am going to need someone to help me make these changes. I don't feel > > I know enough about the crypto API to do it, and it will take me 1+ week > > to learn it. > > I think that designing a correct set of APIs that can be plugged with > any SSL library is the correct move in the long term. I have on my > agenda to clean up HMAC as SCRAM uses that with SHA256 and you would > use that with SHA512. Daniel has mentioned that he has been touching > this area, and I also got a patch halfly done though pgcrypto needs > some extra thoughts. So this is still WIP but you could reuse that > here. Yeah, looking at what's been done there seems like the right approach to me as well, ideally we'd have one set of APIs that'll support all these use cases (not unlik
Re: Minimal logical decoding on standbys
On Wed, Mar 18, 2020 at 4:50 PM Alvaro Herrera wrote: > > > well, not "forever", but: > > $ make check PROVE_TESTS=t/019_standby_logical_decoding_conflicts.pl PROVE_FLAGS=-v > ... > cd /pgsql/source/master/src/test/recovery && TESTDIR='/home/alvherre/mnt/crypt/alvherre/Code/pgsql/build/master/src/test/recovery' PATH="/pgsql/build/master/tmp_install/pgsql/install/master/bin:$PATH" LD_LIBRARY_PATH="/pgsql/build/master/tmp_install/pgsql/install/master/lib" PGPORT='655432' PG_REGRESS='/home/alvherre/mnt/crypt/alvherre/Code/pgsql/build/master/src/test/recovery/../../../src/test/regress/pg_regress' REGRESS_SHLIB='/pgsql/build/master/src/test/regress/regress.so' /usr/bin/prove -I /pgsql/source/master/src/test/perl/ -I /pgsql/source/master/src/test/recovery -v t/ 019_standby_logical_decoding_conflicts.pl > t/019_standby_logical_decoding_conflicts.pl .. > 1..24 > ok 1 - dropslot on standby created > ok 2 - activeslot on standby created > # poll_query_until timed out executing this query: > # SELECT '0/35C9190' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'standby'; > # expecting this output: > # t > # last actual query output: > # > # with stderr: > Bailout called. Further testing stopped: system pg_ctl failed > Bail out! system pg_ctl failed > FAILED--Further testing stopped: system pg_ctl failed > make: *** [Makefile:19: check] Error 255 > After rebase and did minimal tweaks (duplicated oid, TAP tests numbering) I'm facing similar problem but in other place: make -C src/test/recovery check PROVE_TESTS=t/ 023_standby_logical_decoding_conflicts.pl PROVE_FLAGS=-v ... /usr/bin/mkdir -p '/data/src/pg/main/src/test/recovery'/tmp_check cd . && TESTDIR='/data/src/pg/main/src/test/recovery' PATH="/d/src/pg/main/tmp_install/home/fabrizio/pgsql/logical-decoding-standby/bin:$PATH" LD_LIBRARY_PATH="/d/src/pg/main/tmp_install/home/fabrizio/pgsql/logical-decoding-standby/lib" PGPORT='65432' PG_REGRESS='/data/src/pg/main/src/test/recovery/../../../src/test/regress/pg_regress' REGRESS_SHLIB='/d/src/pg/main/src/test/regress/regress.so' /usr/bin/prove -I ../../../src/test/perl/ -I . -v t/ 023_standby_logical_decoding_conflicts.pl t/023_standby_logical_decoding_conflicts.pl .. 1..24 ok 1 - dropslot on standby created ok 2 - activeslot on standby created not ok 3 - dropslot on standby dropped # Failed test 'dropslot on standby dropped' # at t/023_standby_logical_decoding_conflicts.pl line 67. # got: 'logical' # expected: '' not ok 4 - activeslot on standby dropped # Failed test 'activeslot on standby dropped' # at t/023_standby_logical_decoding_conflicts.pl line 68. # got: 'logical' # expected: '' TAP tests hang forever in `check_slots_dropped` exactly here: # our client should've terminated in response to the walsender error eval { $slot_user_handle->finish; }; Regards, -- Fabrízio de Royes Mello PostgreSQL Developer at OnGres Inc. - https://ongres.com From a3e42f2fc53afd2bcbe6da9d029d91694a34562e Mon Sep 17 00:00:00 2001 From: Amit Khandekar Date: Thu, 16 Jan 2020 10:05:15 +0530 Subject: [PATCH v7 1/5] Allow logical decoding on standby. Allow a logical slot to be created on standby. Restrict its usage or its creation if wal_level on primary is less than logical. During slot creation, it's restart_lsn is set to the last replayed LSN. Effectively, a logical slot creation on standby waits for an xl_running_xact record to arrive from primary. Conflicting slots would be handled in next commits. Andres Freund and Amit Khandekar. --- src/backend/access/transam/xlog.c | 11 + src/backend/replication/logical/decode.c | 22 - src/backend/replication/logical/logical.c | 37 --- src/backend/replication/slot.c| 57 +++ src/backend/replication/walsender.c | 10 ++-- src/include/access/xlog.h | 1 + 6 files changed, 98 insertions(+), 40 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8dd225c2e1..609edbaca6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5038,6 +5038,17 @@ LocalProcessControlFile(bool reset) ReadControlFile(); } +/* + * Get the wal_level from the control file. For a standby, this value should be + * considered as its active wal_level, because it may be different from what + * was originally configured on standby. + */ +WalLevel +GetActiveWalLevel(void) +{ + return ControlFile->wal_level; +} + /* * Initialization of shared memory for XLOG */ diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 3f84ee99b8..bb7c80d6cc 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -203,11 +203,31 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case
Re: Reloptions for table access methods
On Tue, 1 Sept 2020 at 18:21, Jeff Davis wrote: > I went with the simple approach because fixing that problem seemed a > bit over-engineered. Here are some thoughts on what we could do: The simple patch is admirable, but not something we should put into core. I definitely don't agree with the premise that all existing heap options should be common across all new or extension tableAMs. There are dozens of such options and I don't believe that they would all have meaning in all future storage plugins. I think options should just work exactly the same for Indexes and Tables. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: SQL/JSON: functions
út 15. 12. 2020 v 18:00 odesílatel Simon Riggs napsal: > On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov > wrote: > > > > Attached 50th version of the patches. Only the documentation was changed > > since the previous version. > > I can imagine the effort required to get to v50, so I salute your efforts. > > The document for SQL Standard has now been published as CD > 9075-2-Foundation (Thanks Peter). > > That gives us a clearer picture of what is being voted on and should > allow Nikita to complete his work. > > I suggest we move forwards on this now, but if anyone objects to > including this in PG14 in favour of waiting for the vote, please say > so clearly so we can skip to PG15. > Maybe this set of patches can be reorganized and divided. Some parts like json generating functions are almost trivial and without controversions with clean benefits for users. The most complexity is related to json_table function. Nikita did a very good job and implemented this function in maximal conformance with ANSI SQL with a maximal set of features. On second hand it is hard to do review because this patch is really complex, and a lot of functionality was not implemented elsewhere (so isn't possible to compare results). I think it should be possible to reduce complexity and divide acceptance of json_table to some steps like basic functionality (on MySQL level), enhanced functionality (on Oracle level), and full functionality (the Postgres will be first). This functionality is interesting and maximal conformity with SQL/JSON is great so I am for merging it. But if it will be divided into some chronological steps, then there can be higher probability of merging to upstream in a good time. Regards Pavel > -- > Simon Riggshttp://www.EnterpriseDB.com/ >
Re: REINDEX backend filtering
On Tue, Dec 15, 2020 at 12:22 PM Julien Rouhaud wrote: > > On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier wrote: > > > > On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote: > > > Now that we have the infrastructure to track indexes that might be > > > corrupted > > > due to changes in collation libraries, I think it would be a good idea to > > > offer > > > an easy way for users to reindex all indexes that might be corrupted. > > > > Yes. It would be a good thing. > > > > > The filter is also implemented so that you could cumulate multiple > > > filters, so > > > it could be easy to add more filtering, for instance: > > > > > > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb; > > > > > > to only rebuild indexes depending on outdated libc collations, or > > > > > > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb; > > > > > > to only rebuild indexes depending on a specific version of libc. > > > > Deciding on the grammar to use depends on the use cases we would like > > to satisfy. From what I heard on this topic, the goal is to reduce > > the amount of time necessary to reindex a system so as REINDEX only > > works on indexes whose dependent collation versions are not known or > > works on indexes in need of a collation refresh (like a reindexdb > > --all --collation -j $jobs). What would be the benefit in having more > > complexity with library-dependent settings while we could take care > > of the use cases that matter the most with a simple grammar? Perhaps > > "not_current" is not the best match as a keyword, we could just use > > "collation" and handle that as a boolean. As long as we don't need > > new operators in the grammar rules.. > > I'm not sure what the DBA usual pattern here. If the reindexing > runtime is really critical, I'm assuming that at least some people > will dig into library details to see what are the collations that > actually broke in the last upgrade and will want to reindex only > those, and force the version for the rest of the indexes. And > obviously, they probably won't wait to have multiple collation > versions dependencies before taking care of that. In that case the > filters that would matters would be one to only keep indexes with an > outdated collation version, and an additional one for a specific > collation name. Or we could have the COLLATION keyword without > additional argument mean all outdated collations, and COLLATION > 'collation_name' to specify a specific one. This is maybe a bit ugly, > and would probably require a different approach for reindexdb. Is this really a common enough operation that we need it i the main grammar? Having the functionality, definitely, but what if it was "just" a function instead? So you'd do something like: SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here) \gexec Or even a function that returns the REINDEX commands directly (taking a parameter to turn on/off concurrency for example). That also seems like it would be easier to make flexible, and just as easy to plug into reindexdb? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: On login trigger: take three
> Please notice that we still need GUC to disable on-login triggers: to make >> it possible for superuser who did mistake and defined incorrect on-login >> trigger to >> login to the system. >> Do we need GUC to disable all other event triggers? May be I am wrong, >> but I do not see much need in such GUC: error in any of such event triggers >> is non fatal >> and can be easily reverted. >> So the only question is whether "disable_client_connection_trigger" >> should be true by default or not... >> >> I agree with you that @2 is a little bit chaotic and @1 looks like a >> workaround. >> But from my point of view @3 is not the best solution but overkill: >> maintaining yet another shared hash just to save few milliseconds on login >> seems to be too high price. >> Actually there are many things which are loaded by new backend from the >> database on start: for example - catalog. >> This is why launch of new backend is an expensive operation. >> Certainly if we execute "select 1", then system catalog is not needed... >> But does anybody start new backend just to execute "select 1" and exit? >> >> >> > I understand so the implementation of a new shared cache can be a lot of > work. The best way is enhancing pg_database about one column with > information about the login triggers (dathaslogontriggers). In init time > these data are in syscache, and can be easily checked. Some like > pg_attribute have an atthasdef column. Same fields has pg_class - > relhasrules, relhastriggers, ... Then the overhead of this design should be > really zero. > > What do you think about it? > > I like this approach more than implementation of shared hash. > But still I have some concerns: > > 1. pg_database table format has to be changed. Certainly it is not > something completely unacceptable. But IMHO we should try to avoid > modification of such commonly used catalog tables as much as possible. > yes, I know. Unfortunately I don't see any other and correct solution. There should be more wide discussion before this work about this topic. On second hand, this change should not break anything (if this new field will be placed on as the last field). The logon trigger really looks like a database trigger - so I think so this semantic is correct. I have no idea if it is acceptable for committers :-/. I hope so. The fact that the existence of a logon trigger can be visible from a shared database object can be an interesting feature too. > 2. It is not so easy to maintain this flag. There can be multiple on-login > triggers defined. If such trigger is dropped, we can not just clear this > flag. > We should check if other triggers exist. Now assume that there are two > triggers and two concurrent transactions dropping each one. > According to their snapshots them do not see changes made by other > transaction. So them remove both triggers but didn't clear the flag. > Certainly we can use counter instead of flag. But I am not sure that their > will be not other problems with maintaining counter. > I don't think it is necessary. My opinion is not too strong, but if pg_class doesn't need to hold a number of triggers, then I think so pg_database doesn't need to hold this number too. > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
Re: SQL/JSON: functions
On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov wrote: > > Attached 50th version of the patches. Only the documentation was changed > since the previous version. I can imagine the effort required to get to v50, so I salute your efforts. The document for SQL Standard has now been published as CD 9075-2-Foundation (Thanks Peter). That gives us a clearer picture of what is being voted on and should allow Nikita to complete his work. I suggest we move forwards on this now, but if anyone objects to including this in PG14 in favour of waiting for the vote, please say so clearly so we can skip to PG15. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Proposed patch for key managment
On Tue, Dec 15, 2020 at 10:36:56AM +0800, Neil Chen wrote: > 2. I tried to add support for AES_CTR mode, and the code for encrypting buffer > blocks. In the process I found that in pg_cipher_ctx_create, the key length is > declared as "byte". However, in the CryptoKey structure, the length is stored > as "bit", which leads me to use a form similar to Key->klen / 8 when I call > this function. Maybe we should unify the two to avoid unnecessary confusion. Yes, I would also like to get opinions on this. We certainly have to have the key length be in _bit_ units when visible by users, but I see a lot of cases where we allocate arrays based on bytes. I am unclear where the proper units should be. At a minimum, we should specify the units in the function parameter names. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: On login trigger: take three
On 15.12.2020 18:25, Pavel Stehule wrote: út 15. 12. 2020 v 15:06 odesílatel Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> napsal: On 15.12.2020 16:18, Pavel Stehule wrote: út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> napsal: On 11.12.2020 19:27, Pavel Stehule wrote: pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> napsal: On 11.12.2020 18:40, Pavel Stehule wrote: is not correct. It makes it not possible to superuser to disable triggers for all users. pg_database_ownercheck returns true for superuser always. Sorry, but I consider different case: when normal user is connected to the database. In this case pg_database_ownercheck returns false and trigger is not disabled, isn't it? My idea was to reduce necessary rights to database owners. But you have a true, so only superuser can create event trigger, so this feature cannot be used in DBaaS environments, and then my original idea was wrong. Also GUCs are not associated with any database. So I do not understand why this check of database ownership is relevant at all? What kind of protection violation we want to prevent? It seems to be obvious that normal user should not be able to prevent trigger execution because this triggers may be used to enforce some security policies. If trigger was created by user itself, then it can drop or disable it using ALTER statement. GUC is not needed for it. when you cannot connect to the database, then you cannot do ALTER. In DBaaS environments lot of users has not superuser rights. But only superusers can set login triggers, right? So only superuser can make a mistake in this trigger. But he have enough rights to recover this error. Normal users are not able to define on connection triggers and should not have rights to disable them. yes, it is true Pavel -- Konstantin Knizhnik Postgres Professional:http://www.postgrespro.com The Russian Postgres Company So what's next? I see three options: 1. Do not introduce GUC for disabling all event triggers (especially taken in account that them are disabled by default). Return back to the patch on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with "disable_client_connection_trigger" and make it true by default (to eliminate any overhead for users which are not using on logintriggers). 2. Have two GUCS: "disable_client_connection_trigger" and "disable_event_triggers". 3. Implement some mechanism for caching presence of event triggers in shared memory. @3 is the best design (the things should work well by default), @2 is a little bit chaotic and @1 looks like a workaround. Please notice that we still need GUC to disable on-login triggers: to make it possible for superuser who did mistake and defined incorrect on-login trigger to login to the system. Do we need GUC to disable all other event triggers? May be I am wrong, but I do not see much need in such GUC: error in any of such event triggers is non fatal and can be easily reverted. So the only question is whether "disable_client_connection_trigger" should be true by default or not... I agree with you that @2 is a little bit chaotic and @1 looks like a workaround. But from my point of view @3 is not the best solution but overkill: maintaining yet another shared hash just to save few milliseconds on login seems to be too high price. Actually there are many things which are loaded by new backend from the database on start: for example - catalog. This is why launch of new backend is an expensive operation. Certainly if we execute "select 1", then system catalog is not needed... But does anybody start new backend just to execute "select 1" and exit? I understand so the implementation of a new shared cache can be a lot of work. The best way is enhancing pg_database about one column with information about the login triggers (dathaslogontriggers). In init time these data are in syscache, and can be easily checked. Some like pg_attribute have an atthasdef column. Same fields has pg_class - relhasrules, relhastriggers, ... Then the overhead of this design should be really zero. What do you think about it? I like this approach more than implementation of shared hash. But still I have
Re: Proposed patch for key managment
On Mon, Dec 14, 2020 at 11:16:18PM -0500, Bruce Momjian wrote: > > 1. Previously, we added a variable bootstrap_keys_wrap that is used for > > encryption during initdb. However, since we save the "wrapped" key, we need > > to > > use a global KEK that can be accessed in boot mode to unwrap it before > > use... I > > don't know if that's good. To make it simple, I modified the > > bootstrap_keys_wrap to store the "unwrapped" key so that the encryption > > function can get it correctly. (The variable name should be changed > > accordingly). > > I see what you are saying. We store the wrapped in bootstrap mode, but > the unwrapped in normal mode. There is also the case of when we copy > the keys from an old cluster. I will work on a patch tomorrow and > report back here. I had not considered that we need the date keys available in bootstrap mode, even if we copied them from another cluster during pg_upgrade. I have updated the diff URLs and attaching a patch showing the changes I made. Basically, I had to separate BootStrapKmgr() into sections: 1. copy or create an empty live key directory 2. get the pass phrase 3. populate the live key directory if we didn't copy it 4. decrypt they keys into a file-scoped variable Thanks for showing me this missing feature. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee diff --git a/src/backend/crypto/kmgr.c b/src/backend/crypto/kmgr.c new file mode 100644 index 9143e72..24c5d7f *** a/src/backend/crypto/kmgr.c --- b/src/backend/crypto/kmgr.c *** static KmgrShmemData *KmgrShmem; *** 50,56 char *cluster_passphrase_command = NULL; int file_encryption_keylen = 0; ! CryptoKey bootstrap_keys_wrap[KMGR_MAX_INTERNAL_KEYS]; extern char *bootstrap_old_key_datadir; extern int bootstrap_file_encryption_keylen; --- 50,56 char *cluster_passphrase_command = NULL; int file_encryption_keylen = 0; ! CryptoKey bootstrap_keys[KMGR_MAX_INTERNAL_KEYS]; extern char *bootstrap_old_key_datadir; extern int bootstrap_file_encryption_keylen; *** static CryptoKey *generate_crypto_key(in *** 65,74 void BootStrapKmgr(void) { ! PgKeyWrapCtx *ctx; char passphrase[KMGR_MAX_PASSPHRASE_LEN]; - uint8 KEK_enc[KMGR_ENC_KEY_LEN]; - uint8 KEK_hmac[KMGR_MAC_KEY_LEN]; int passlen; #ifndef USE_OPENSSL --- 65,74 void BootStrapKmgr(void) { ! char live_path[MAXPGPATH]; ! CryptoKey *keys_wrap; ! int nkeys; char passphrase[KMGR_MAX_PASSPHRASE_LEN]; int passlen; #ifndef USE_OPENSSL *** BootStrapKmgr(void) *** 78,83 --- 78,85 errhint("Compile with --with-openssl to use cluster encryption."; #endif + snprintf(live_path, sizeof(live_path), "%s/%s", DataDir, LIVE_KMGR_DIR); + /* copy cluster file encryption keys from an old cluster? */ if (bootstrap_old_key_datadir != NULL) { *** BootStrapKmgr(void) *** 87,122 bootstrap_old_key_datadir, LIVE_KMGR_DIR); copydir(old_key_dir, LIVE_KMGR_DIR, true); } ! /* generate new cluster file encryption keys */ else { ! char live_path[MAXPGPATH]; ! ! if (mkdir(LIVE_KMGR_DIR, pg_dir_create_mode) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create cluster file encryption directory \"%s\": %m", LIVE_KMGR_DIR))); - memset(bootstrap_keys_wrap, 0, sizeof(bootstrap_keys_wrap)); - /* bzero keys on exit */ - on_proc_exit(bzeroKmgrKeys, 0); - - /* Get key encryption key from the passphrase command */ - snprintf(live_path, sizeof(live_path), "%s/%s", DataDir, LIVE_KMGR_DIR); - passlen = kmgr_run_cluster_passphrase_command(cluster_passphrase_command, - passphrase, KMGR_MAX_PASSPHRASE_LEN, - live_path); - if (passlen < KMGR_MIN_PASSPHRASE_LEN) - ereport(ERROR, - (errmsg("passphrase must be at least %d bytes", - KMGR_MIN_PASSPHRASE_LEN))); - /* Get key encryption key and HMAC key from passphrase */ kmgr_derive_keys(passphrase, passlen, KEK_enc, KEK_hmac); - explicit_bzero(passphrase, passlen); - /* Create temporary key wrap context */ ctx = pg_create_keywrap_ctx(KEK_enc, KEK_hmac); if (!ctx) --- 89,128 bootstrap_old_key_datadir, LIVE_KMGR_DIR); copydir(old_key_dir, LIVE_KMGR_DIR, true); } ! /* create empty directory */ else { ! if (mkdir(LIVE_KMGR_DIR, pg_dir_create_mode) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create cluster file encryption directory \"%s\": %m", LIVE_KMGR_DIR))); + } + + /* + * Get key encryption key from the passphrase command. The passphrase + * command might want to check for the existance of files in the + * live directory, so run this _after_ copying the directory in place. + */ + passlen =
Re: On login trigger: take three
út 15. 12. 2020 v 15:06 odesílatel Konstantin Knizhnik < k.knizh...@postgrespro.ru> napsal: > > > On 15.12.2020 16:18, Pavel Stehule wrote: > > > > út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik < > k.knizh...@postgrespro.ru> napsal: > >> >> >> On 11.12.2020 19:27, Pavel Stehule wrote: >> >> >> >> pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik < >> k.knizh...@postgrespro.ru> napsal: >> >>> >>> >>> On 11.12.2020 18:40, Pavel Stehule wrote: >>> >>> >>> is not correct. It makes it not possible to superuser to disable triggers for all users. >>> >>> pg_database_ownercheck returns true for superuser always. >>> >>> >>> Sorry, but I consider different case: when normal user is connected to >>> the database. >>> In this case pg_database_ownercheck returns false and trigger is not >>> disabled, isn't it? >>> >> >> My idea was to reduce necessary rights to database owners. But you have >> a true, so only superuser can create event trigger, so this feature cannot >> be used in DBaaS environments, and then my original idea was wrong. >> >> >>> >>> Also GUCs are not associated with any database. So I do not understand why this check of database ownership is relevant at all? What kind of protection violation we want to prevent? It seems to be obvious that normal user should not be able to prevent trigger execution because this triggers may be used to enforce some security policies. If trigger was created by user itself, then it can drop or disable it using ALTER statement. GUC is not needed for it. >>> >>> when you cannot connect to the database, then you cannot do ALTER. In >>> DBaaS environments lot of users has not superuser rights. >>> >>> >>> >>> But only superusers can set login triggers, right? >>> So only superuser can make a mistake in this trigger. But he have enough >>> rights to recover this error. Normal users are not able to define on >>> connection triggers and >>> should not have rights to disable them. >>> >> >> yes, it is true >> >> Pavel >> >> >>> -- >>> Konstantin Knizhnik >>> Postgres Professional: http://www.postgrespro.com >>> The Russian Postgres Company >>> >>> >> So what's next? >> I see three options: >> >> 1. Do not introduce GUC for disabling all event triggers (especially >> taken in account that them are disabled by default). >> Return back to the patch >> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with >> "disable_client_connection_trigger" >> and make it true by default (to eliminate any overhead for users which >> are not using on logintriggers). >> >> 2. Have two GUCS: "disable_client_connection_trigger" and >> "disable_event_triggers". >> >> 3. Implement some mechanism for caching presence of event triggers in >> shared memory. >> > > @3 is the best design (the things should work well by default), @2 is a > little bit chaotic and @1 looks like a workaround. > > > Please notice that we still need GUC to disable on-login triggers: to make > it possible for superuser who did mistake and defined incorrect on-login > trigger to > login to the system. > Do we need GUC to disable all other event triggers? May be I am wrong, but > I do not see much need in such GUC: error in any of such event triggers is > non fatal > and can be easily reverted. > So the only question is whether "disable_client_connection_trigger" should > be true by default or not... > > I agree with you that @2 is a little bit chaotic and @1 looks like a > workaround. > But from my point of view @3 is not the best solution but overkill: > maintaining yet another shared hash just to save few milliseconds on login > seems to be too high price. > Actually there are many things which are loaded by new backend from the > database on start: for example - catalog. > This is why launch of new backend is an expensive operation. > Certainly if we execute "select 1", then system catalog is not needed... > But does anybody start new backend just to execute "select 1" and exit? > > > I understand so the implementation of a new shared cache can be a lot of work. The best way is enhancing pg_database about one column with information about the login triggers (dathaslogontriggers). In init time these data are in syscache, and can be easily checked. Some like pg_attribute have an atthasdef column. Same fields has pg_class - relhasrules, relhastriggers, ... Then the overhead of this design should be really zero. What do you think about it? Pavel > > > > > Regards > > Pavel > > >> >> -- >> Konstantin Knizhnik >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> >> > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
Re: Proposed patch for key managment
On Tue, Dec 15, 2020 at 02:20:33PM +0900, Michael Paquier wrote: > > Uh, I got this code from pg_resetwal.c, which does something similar to > > pg_altercpass. > > Yeah, that's actually the part where it is a bad idea to just copy > this pattern. pg_resetwal should not do that in the long term in my > opinion, but nobody has come to clean up this stuff. (- -;) Glad you asked about this. It turns out pg_altercpass works fine with postgres_fe.h, so I will now use that. pg_resetwal still can't use it though. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Speeding up GIST index creation for tsvectors
On Sun, 13 Dec 2020 at 9:28 PM, Andrey Borodin wrote: > +1 > This will make all INSERTs and UPDATES for tsvector's GiSTs. Oh, I didn't realize that this code is getting used in GIST index insertion and creation too. Will check there. > Also I really like idea of taking advantage of hardware capabilities like > __builtin_* etc wherever possible. Yes. Also, the __builtin_popcount() uses SIMD vectorization (on arm64 : "cnt v0.8b, v0.8b"), hence there's all the more reason to use it. Over and above that, I had thought that if we can auto-vectorize the byte-by-byte xor operation and the popcount() call using compiler optimizations, we would benefit out of this, but didn't see any more improvement. I hoped for the benefit because that would have allowed us to process in 128-bit chunks or 256-bit chunks, since the vector registers are at least that long. Maybe gcc is not that smart to translate __builtin_popcount() to 128/256 bit vectorized instruction. But for XOR operator, it does translate to 128bit vectorized instructions (on arm64 : "eor v2.16b, v2.16b, v18.16b") > Meanwhile there are at least 4 incarnation of hemdistsign() functions that > are quite similar. I'd propose to refactor them somehow... Yes, I hope we get the benefit there also. Before that, I thought I should post the first use-case to get some early comments. Thanks for your encouraging comments :)
Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
Noah Misch writes: > On Mon, Dec 14, 2020 at 01:59:03PM -0500, Tom Lane wrote: >> Here's a rolled-up patch that does some further documentation work >> and gets rid of the unnecessary memset's as well. > +1 on removing the memset() calls. That said, it's not a big deal if more > creep in over time; it doesn't qualify as a project policy violation. Right, that part is just neatnik-ism. Neither the calls with memset nor the ones without are buggy. >> + * *infoP and hash_flags should specify at least the entry sizes and key > s/should/must/ OK; thanks for reviewing! regards, tom lane
Re: On login trigger: take three
út 15. 12. 2020 v 15:06 odesílatel Konstantin Knizhnik < k.knizh...@postgrespro.ru> napsal: > > > On 15.12.2020 16:18, Pavel Stehule wrote: > > > > út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik < > k.knizh...@postgrespro.ru> napsal: > >> >> >> On 11.12.2020 19:27, Pavel Stehule wrote: >> >> >> >> pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik < >> k.knizh...@postgrespro.ru> napsal: >> >>> >>> >>> On 11.12.2020 18:40, Pavel Stehule wrote: >>> >>> >>> is not correct. It makes it not possible to superuser to disable triggers for all users. >>> >>> pg_database_ownercheck returns true for superuser always. >>> >>> >>> Sorry, but I consider different case: when normal user is connected to >>> the database. >>> In this case pg_database_ownercheck returns false and trigger is not >>> disabled, isn't it? >>> >> >> My idea was to reduce necessary rights to database owners. But you have >> a true, so only superuser can create event trigger, so this feature cannot >> be used in DBaaS environments, and then my original idea was wrong. >> >> >>> >>> Also GUCs are not associated with any database. So I do not understand why this check of database ownership is relevant at all? What kind of protection violation we want to prevent? It seems to be obvious that normal user should not be able to prevent trigger execution because this triggers may be used to enforce some security policies. If trigger was created by user itself, then it can drop or disable it using ALTER statement. GUC is not needed for it. >>> >>> when you cannot connect to the database, then you cannot do ALTER. In >>> DBaaS environments lot of users has not superuser rights. >>> >>> >>> >>> But only superusers can set login triggers, right? >>> So only superuser can make a mistake in this trigger. But he have enough >>> rights to recover this error. Normal users are not able to define on >>> connection triggers and >>> should not have rights to disable them. >>> >> >> yes, it is true >> >> Pavel >> >> >>> -- >>> Konstantin Knizhnik >>> Postgres Professional: http://www.postgrespro.com >>> The Russian Postgres Company >>> >>> >> So what's next? >> I see three options: >> >> 1. Do not introduce GUC for disabling all event triggers (especially >> taken in account that them are disabled by default). >> Return back to the patch >> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with >> "disable_client_connection_trigger" >> and make it true by default (to eliminate any overhead for users which >> are not using on logintriggers). >> >> 2. Have two GUCS: "disable_client_connection_trigger" and >> "disable_event_triggers". >> >> 3. Implement some mechanism for caching presence of event triggers in >> shared memory. >> > > @3 is the best design (the things should work well by default), @2 is a > little bit chaotic and @1 looks like a workaround. > > > Please notice that we still need GUC to disable on-login triggers: to make > it possible for superuser who did mistake and defined incorrect on-login > trigger to > login to the system. > Do we need GUC to disable all other event triggers? May be I am wrong, but > I do not see much need in such GUC: error in any of such event triggers is > non fatal > and can be easily reverted. > So the only question is whether "disable_client_connection_trigger" should > be true by default or not... > > I agree with you that @2 is a little bit chaotic and @1 looks like a > workaround. > But from my point of view @3 is not the best solution but overkill: > maintaining yet another shared hash just to save few milliseconds on login > seems to be too high price. > Actually there are many things which are loaded by new backend from the > database on start: for example - catalog. > This is why launch of new backend is an expensive operation. > Certainly if we execute "select 1", then system catalog is not needed... > But does anybody start new backend just to execute "select 1" and exit? > This is the worst case but tested on a not too bad CPU (i7). In virtual environments with overloaded CPU the effect can be worse. > > > > > > > Regards > > Pavel > > >> >> -- >> Konstantin Knizhnik >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> >> > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
On 2020/11/04 18:06, Fujii Masao wrote: On 2020/10/29 15:21, Fujii Masao wrote: On 2020/10/07 11:30, Bharath Rupireddy wrote: On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy wrote: On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao wrote: +1 Or it's also ok to make each patch separately. Anyway, thanks! Thanks. +1 to have separate patches. I will post two separate patches for autoprewarm and WalRcvShutdownHandler for further review. The other two patches can be for startup.c and syslogger.c. I'm attaching all the 4 patches here together. 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch -- This is the patch initially sent in this mail by me, I just renamed it. 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch -- This is the patch proposed by Fuji Masao, I just renamed it. 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch Please consider these patches for further review. Thanks for the patches! They look good to me. So barring any objections, I will commit them one by one. I pushed the following two patches. - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch As I told in other thread [1], I'm thinking to revert this patch because this change could have bad side-effect on the startup process waiting for recovery conflict. Before applying the patch, the latch that the startup process used to wait for recovery conflict was different from the latch that SIGHUP signal handler or walreceiver process, etc set to wake the startup process up. So SIGHUP or walreceiver didn't wake the startup process waiting for recovery conflict up unnecessary. But the patch got rid of the dedicated latch for signaling the startup process. This change forced us to use the same latch to make the startup process wait or wake up. Which caused SIGHUP signal handler or walreceiver proces to wake the startup process waiting on the latch for recovery conflict up unnecessarily frequently. While waiting for recovery conflict on buffer pin, deadlock needs to be checked at least every deadlock_timeout. But that frequent wakeups could prevent the deadlock timer from being triggered and could delay that deadlock checks. Therefore, I'm thinking to revert the commit ac22929a26 and 113d3591b8. [1] https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726...@oss.nttdata.com Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: On login trigger: take three
On 15.12.2020 16:18, Pavel Stehule wrote: út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> napsal: On 11.12.2020 19:27, Pavel Stehule wrote: pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> napsal: On 11.12.2020 18:40, Pavel Stehule wrote: is not correct. It makes it not possible to superuser to disable triggers for all users. pg_database_ownercheck returns true for superuser always. Sorry, but I consider different case: when normal user is connected to the database. In this case pg_database_ownercheck returns false and trigger is not disabled, isn't it? My idea was to reduce necessary rights to database owners. But you have a true, so only superuser can create event trigger, so this feature cannot be used in DBaaS environments, and then my original idea was wrong. Also GUCs are not associated with any database. So I do not understand why this check of database ownership is relevant at all? What kind of protection violation we want to prevent? It seems to be obvious that normal user should not be able to prevent trigger execution because this triggers may be used to enforce some security policies. If trigger was created by user itself, then it can drop or disable it using ALTER statement. GUC is not needed for it. when you cannot connect to the database, then you cannot do ALTER. In DBaaS environments lot of users has not superuser rights. But only superusers can set login triggers, right? So only superuser can make a mistake in this trigger. But he have enough rights to recover this error. Normal users are not able to define on connection triggers and should not have rights to disable them. yes, it is true Pavel -- Konstantin Knizhnik Postgres Professional:http://www.postgrespro.com The Russian Postgres Company So what's next? I see three options: 1. Do not introduce GUC for disabling all event triggers (especially taken in account that them are disabled by default). Return back to the patch on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with "disable_client_connection_trigger" and make it true by default (to eliminate any overhead for users which are not using on logintriggers). 2. Have two GUCS: "disable_client_connection_trigger" and "disable_event_triggers". 3. Implement some mechanism for caching presence of event triggers in shared memory. @3 is the best design (the things should work well by default), @2 is a little bit chaotic and @1 looks like a workaround. Please notice that we still need GUC to disable on-login triggers: to make it possible for superuser who did mistake and defined incorrect on-login trigger to login to the system. Do we need GUC to disable all other event triggers? May be I am wrong, but I do not see much need in such GUC: error in any of such event triggers is non fatal and can be easily reverted. So the only question is whether "disable_client_connection_trigger" should be true by default or not... I agree with you that @2 is a little bit chaotic and @1 looks like a workaround. But from my point of view @3 is not the best solution but overkill: maintaining yet another shared hash just to save few milliseconds on login seems to be too high price. Actually there are many things which are loaded by new backend from the database on start: for example - catalog. This is why launch of new backend is an expensive operation. Certainly if we execute "select 1", then system catalog is not needed... But does anybody start new backend just to execute "select 1" and exit? Regards Pavel -- Konstantin Knizhnik Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch
Here is a new patch for this. This now follows the implementation that Tom has suggested: Leave date_part() alone, add a new set of extract() functions, and map the SQL EXTRACT construct to those. I have basically just copied over the implementations from my previous patch and placed them next to the existing date_part() implementations. So all the behavior is still the same as in the previous patches. One thing I still need to look into is how to not lose all the test coverage for date_part(). But that should be fairly mechanical, so I'm leaving it off in this version. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/ >From b40213b0d31b8ac55cda4658c21d06e701b6e273 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 15 Dec 2020 14:27:58 +0100 Subject: [PATCH v4] Change return type of EXTRACT to numeric The previous implementation of EXTRACT mapped internally to date_part(), which returned type double precision (since it was implemented long before the numeric type existed). This can lead to imprecise output in some cases, so returning numeric would be preferrable. Changing the return type of an existing function is a bit risky, so instead we do the following: We implement a new set of functions, which are now called "extract", in parallel to the existing date_part functions. They work the same way internally but use numeric instead of float8. The EXTRACT construct is now mapped by the parser to these new extract functions. That way, dumps of views etc. from old versions (which would use date_part) continue to work unchanged, but new uses will map to the new extract functions. Additionally, the reverse compilation of EXTRACT now reproduces the original syntax, using the new mechanism introduced in 40c24bfef92530bd846e111c1742c2a54441c62c. The following minor changes of behavior result from the new implementation: - The column name from an isolated EXTRACT call is now "extract" instead of "date_part". - Extract from date now rejects inappropriate field names such as HOUR. It was previously mapped internally to extract from timestamp, so it would silently accept everything appropriate for timestamp. - Return values when extracting fields with possibly fractional values, such as second and epoch, now have the full scale that the value has internally (so, for example, '1.00' instead of just '1'). Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce1...@phystech.edu --- doc/src/sgml/func.sgml | 10 +- src/backend/parser/gram.y | 2 +- src/backend/utils/adt/date.c| 407 +++ src/backend/utils/adt/ruleutils.c | 21 + src/backend/utils/adt/timestamp.c | 708 +++- src/include/catalog/pg_proc.dat | 18 + src/test/regress/expected/create_view.out | 2 +- src/test/regress/expected/date.out | 482 ++--- src/test/regress/expected/interval.out | 72 +- src/test/regress/expected/psql_crosstab.out | 12 +- src/test/regress/expected/time.out | 24 +- src/test/regress/expected/timetz.out| 42 +- src/test/regress/sql/date.sql | 6 +- 13 files changed, 1445 insertions(+), 361 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df29af6371..a42993c4a5 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -8705,7 +8705,7 @@ Date/Time Functions extract extract ( field from timestamp ) - double precision + numeric Get timestamp subfield; see @@ -8719,7 +8719,7 @@ Date/Time Functions extract ( field from interval ) - double precision + numeric Get interval subfield; see @@ -9234,7 +9234,7 @@ EXTRACT, date_part well.) field is an identifier or string that selects what field to extract from the source value. The extract function returns values of type -double precision. +numeric. The following are valid field names: @@ -9645,6 +9645,10 @@ EXTRACT, date_part be a string value, not a name. The valid field names for date_part are the same as for extract. +For historical reasons, the date_part function +returns values of type double precision. This can result in +a loss of precision in certain uses. Using extract +is recommended instead. diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 8f341ac006..2e8924aff8 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -13768,7 +13768,7 @@ func_expr_common_subexpr: { $$ = makeTypeCast($3, $5, @1); } | EXTRACT '(' extract_list ')' { - $$ = (Node *) makeFuncCall(
Re: TAP tests aren't using the magic words for Windows file access
On 12/15/20 12:05 AM, r.zhar...@postgrespro.ru wrote: > Hello hackers, > > Are there any plans to backport the patch to earlier versions > of the Postgres? > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=114541d58e5970e51b78b77b65de16210beaab43 > > > We rarely see the issue with the pg_ctl/004_logrotate test on > the REL_12_STABLE branch. On my notebook I can easily reproduce > the "Permission denied at src/test/perl/TestLib.pm line 259" > error with the small change below. But the same test on the > 13th version and the 12th version with the TestLib patch does > not fail. > > diff --git a/src/bin/pg_ctl/t/004_logrotate.pl > b/src/bin/pg_ctl/t/004_logrotate.pl > > index bc39abd23e4..e49e159bc84 > 100644 > > --- > a/src/bin/pg_ctl/t/004_logrotate.pl > > +++ > b/src/bin/pg_ctl/t/004_logrotate.pl > > @@ -72,7 +72,7 @@ for (my > $attempts = 0; $attempts < $max_attempts; > $attempts++) > > > { > > > $new_current_logfiles = slurp_file($node->data_dir . > '/current_logfiles'); > last > if $new_current_logfiles ne > $current_logfiles; > > - > usleep(100_000); > > > + > usleep(1); > > > } > > > > > > note "now current_logfiles = $new_current_logfiles"; > > > Oops, looks like that slipped off my radar somehow, I'll see about backpatching it right away. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: On login trigger: take three
út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik < k.knizh...@postgrespro.ru> napsal: > > > On 11.12.2020 19:27, Pavel Stehule wrote: > > > > pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik < > k.knizh...@postgrespro.ru> napsal: > >> >> >> On 11.12.2020 18:40, Pavel Stehule wrote: >> >> >> is not correct. It makes it not possible to superuser to disable triggers >>> for all users. >>> >> >> pg_database_ownercheck returns true for superuser always. >> >> >> Sorry, but I consider different case: when normal user is connected to >> the database. >> In this case pg_database_ownercheck returns false and trigger is not >> disabled, isn't it? >> > > My idea was to reduce necessary rights to database owners. But you have a > true, so only superuser can create event trigger, so this feature cannot be > used in DBaaS environments, and then my original idea was wrong. > > >> >> Also GUCs are not associated with any database. So I do not understand >>> why this check of database ownership is relevant at all? >>> >>> What kind of protection violation we want to prevent? >>> >>> It seems to be obvious that normal user should not be able to prevent >>> trigger execution because this triggers may be used to enforce some >>> security policies. >>> If trigger was created by user itself, then it can drop or disable it >>> using ALTER statement. GUC is not needed for it. >>> >> >> when you cannot connect to the database, then you cannot do ALTER. In >> DBaaS environments lot of users has not superuser rights. >> >> >> >> But only superusers can set login triggers, right? >> So only superuser can make a mistake in this trigger. But he have enough >> rights to recover this error. Normal users are not able to define on >> connection triggers and >> should not have rights to disable them. >> > > yes, it is true > > Pavel > > >> -- >> Konstantin Knizhnik >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> >> > So what's next? > I see three options: > > 1. Do not introduce GUC for disabling all event triggers (especially taken > in account that them are disabled by default). > Return back to the patch > on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with > "disable_client_connection_trigger" > and make it true by default (to eliminate any overhead for users which are > not using on logintriggers). > > 2. Have two GUCS: "disable_client_connection_trigger" and > "disable_event_triggers". > > 3. Implement some mechanism for caching presence of event triggers in > shared memory. > @3 is the best design (the things should work well by default), @2 is a little bit chaotic and @1 looks like a workaround. Regards Pavel > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
Re: On login trigger: take three
On 11.12.2020 19:27, Pavel Stehule wrote: pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik mailto:k.knizh...@postgrespro.ru>> napsal: On 11.12.2020 18:40, Pavel Stehule wrote: is not correct. It makes it not possible to superuser to disable triggers for all users. pg_database_ownercheck returns true for superuser always. Sorry, but I consider different case: when normal user is connected to the database. In this case pg_database_ownercheck returns false and trigger is not disabled, isn't it? My idea was to reduce necessary rights to database owners. But you have a true, so only superuser can create event trigger, so this feature cannot be used in DBaaS environments, and then my original idea was wrong. Also GUCs are not associated with any database. So I do not understand why this check of database ownership is relevant at all? What kind of protection violation we want to prevent? It seems to be obvious that normal user should not be able to prevent trigger execution because this triggers may be used to enforce some security policies. If trigger was created by user itself, then it can drop or disable it using ALTER statement. GUC is not needed for it. when you cannot connect to the database, then you cannot do ALTER. In DBaaS environments lot of users has not superuser rights. But only superusers can set login triggers, right? So only superuser can make a mistake in this trigger. But he have enough rights to recover this error. Normal users are not able to define on connection triggers and should not have rights to disable them. yes, it is true Pavel -- Konstantin Knizhnik Postgres Professional:http://www.postgrespro.com The Russian Postgres Company So what's next? I see three options: 1. Do not introduce GUC for disabling all event triggers (especially taken in account that them are disabled by default). Return back to the patch on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with "disable_client_connection_trigger" and make it true by default (to eliminate any overhead for users which are not using on logintriggers). 2. Have two GUCS: "disable_client_connection_trigger" and "disable_event_triggers". 3. Implement some mechanism for caching presence of event triggers in shared memory. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Movement of restart_lsn position movement of logical replication slots is very slow
On Tue, Dec 15, 2020 at 11:00 AM Jammie wrote: > > Thanks Amit for the response > > We are using pgJDBC sample program here > https://jdbc.postgresql.org/documentation/head/replication.html > > the setFlushLSN is coming from the pgJDBC only. > > git hub for APIs of pgJDBC methods available. > > https://github.com/pgjdbc/pgjdbc > > The second slot refers to "private" slot. > > So ""we are not doing reading from the stream' ==> It means that we are > having readPending call only from the shared slot then we get the > lastReceivedLSN() from stream and > send it back to stream as confirmed_flush_lsn for both private and shared > slot. We dont do readPending call to private slot. we will use private slot > only when we dont have choice. It is kind of reserver slot for us. > I think this (not performing read/decode on the private slot) could be the reason why it lagging behind. If you want to use as a reserve slot then you probably want to at least perform pg_replication_slot_advance() to move it to the required position. The restart_lsn won't move unless you read/decode from that slot. -- With Regards, Amit Kapila.
Re: Add session statistics to pg_stat_database
On Tue, 2020-12-15 at 13:53 +0100, Laurenz Albe wrote: > Attached is patch version 9. Aah, I forgot the ++. Version 10 attached. Yours, Laurenz Albe From b40e34141c80ff59c0005f430bd8c273918eb7bb Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Tue, 15 Dec 2020 13:46:44 +0100 Subject: [PATCH] Add session statistics to pg_stat_database If "track_counts" is active, track the following per database: - total number of connections - number of sessions that ended by loss of network connection, fatal errors and operator intervention - total time spent in database sessions - total time spent executing queries - total idle in transaction time This is useful to check if connection pooling is working. It also helps to estimate the size of the connection pool required to keep the database busy, which depends on the percentage of the transaction time that is spent idling. Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) --- doc/src/sgml/monitoring.sgml | 69 ++ src/backend/catalog/system_views.sql | 7 ++ src/backend/postmaster/pgstat.c | 134 ++- src/backend/tcop/postgres.c | 10 +- src/backend/utils/adt/pgstatfuncs.c | 94 +++ src/backend/utils/error/elog.c | 8 ++ src/include/catalog/pg_proc.dat | 28 ++ src/include/pgstat.h | 39 src/test/regress/expected/rules.out | 7 ++ 9 files changed, 391 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 52a69a5366..6206fefec0 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3731,6 +3731,75 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + session_time double precision + + + Time spent by database sessions in this database, in milliseconds + (note that statistics are only updated when the state of a session + changes, so if sessions have been idle for a long time, this idle time + won't be included) + + + + + + active_time double precision + + + Time spent executing SQL statements in this database, in milliseconds + + + + + + idle_in_transaction_time double precision + + + Time spent idling while in a transaction in this database, in milliseconds + + + + + + sessions bigint + + + Total number of sessions established to this database + + + + + + sessions_abandoned bigint + + + Number of database sessions to this database that were terminated + because connection to the client was lost + + + + + + sessions_fatal bigint + + + Number of database sessions to this database that were terminated + by fatal errors + + + + + + sessions_killed bigint + + + Number of database sessions to this database that were terminated + by operator intervention + + + stats_reset timestamp with time zone diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index b140c210bc..3a2569b135 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -924,6 +924,13 @@ CREATE VIEW pg_stat_database AS pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure, pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time, pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time, +pg_stat_get_db_session_time(D.oid) AS session_time, +pg_stat_get_db_active_time(D.oid) AS active_time, +pg_stat_get_db_idle_in_transaction_time(D.oid) AS idle_in_transaction_time, +pg_stat_get_db_sessions(D.oid) AS sessions, +pg_stat_get_db_sessions_abandoned(D.oid) AS sessions_abandoned, +pg_stat_get_db_sessions_fatal(D.oid) AS sessions_fatal, +pg_stat_get_db_sessions_killed(D.oid) AS sessions_killed, pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset FROM ( SELECT 0 AS oid, NULL::name AS datname diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 7c75a25d21..a2337b78f1 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -258,6 +258,9 @@ static int pgStatXactCommit = 0; static int pgStatXactRollback = 0; PgStat_Counter pgStatBlockReadTime = 0; PgStat_Counter pgStatBlockWriteTime = 0; +static PgStat_Counter pgStatActiveTime = 0; +static PgStat_Counter pgStatTransactionIdleTime = 0; +SessionE
Re: Add session statistics to pg_stat_database
On Sun, 2020-12-13 at 17:49 +0100, Magnus Hagander wrote: > > > I am considering the cases > > > > > > 1) client just went away (currently "aborted") > > > 2) death by FATAL error > > > 3) killed by the administrator (or shutdown) > > > > I named the three counters "sessions_client_eof", "sessions_fatal" and > > "sessions_killed", but I am not wedded to these bike shed colors. > > In true bikeshedding mode, I'm not entirely happy with sessions_client_eof, > but I'm also not sure I have a better suggestion. Maybe just "sessions_lost" > or "sessions_connlost", which is basically the terminology that the > documentation uses? > Maybe it's just me, but I don't really like the eof terminology here. > > What do you think about that? Or does somebody else have an opinion here? I slept over it, and came up with "sessions_abandoned". > In today's dept of small things I noticed: > > + if (disconnect) > + msg.m_disconnect = pgStatSessionEndCause; > > in the non-disconnect state, that variable is left uninitialized, isn't it? > It does end up getting ignored later, but to be more future proof the enum > should probably > have a value specifically for "not disconnected yet"? Yes. I named it DISCONNECT_NOT_YET. > + case DISCONNECT_CLIENT_EOF: > + ++(dbentry->n_sessions_client_eof); > + break; > > The normal syntax we'd use for that would be > dbentry->n_sessions_client_eof++; Ok, changed. > + typedef enum sessionEndType { > > To be consistent with the other enums in the same place, seems this should be > SessionEndType. True. I have renamed the type. Attached is patch version 9. Added goodie: I ran pgindent on it. Yours, Laurenz Albe From b40e34141c80ff59c0005f430bd8c273918eb7bb Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Tue, 15 Dec 2020 13:46:44 +0100 Subject: [PATCH] Add session statistics to pg_stat_database If "track_counts" is active, track the following per database: - total number of connections - number of sessions that ended by loss of network connection, fatal errors and operator intervention - total time spent in database sessions - total time spent executing queries - total idle in transaction time This is useful to check if connection pooling is working. It also helps to estimate the size of the connection pool required to keep the database busy, which depends on the percentage of the transaction time that is spent idling. Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) --- doc/src/sgml/monitoring.sgml | 69 ++ src/backend/catalog/system_views.sql | 7 ++ src/backend/postmaster/pgstat.c | 134 ++- src/backend/tcop/postgres.c | 10 +- src/backend/utils/adt/pgstatfuncs.c | 94 +++ src/backend/utils/error/elog.c | 8 ++ src/include/catalog/pg_proc.dat | 28 ++ src/include/pgstat.h | 39 src/test/regress/expected/rules.out | 7 ++ 9 files changed, 391 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 52a69a5366..6206fefec0 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3731,6 +3731,75 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + session_time double precision + + + Time spent by database sessions in this database, in milliseconds + (note that statistics are only updated when the state of a session + changes, so if sessions have been idle for a long time, this idle time + won't be included) + + + + + + active_time double precision + + + Time spent executing SQL statements in this database, in milliseconds + + + + + + idle_in_transaction_time double precision + + + Time spent idling while in a transaction in this database, in milliseconds + + + + + + sessions bigint + + + Total number of sessions established to this database + + + + + + sessions_abandoned bigint + + + Number of database sessions to this database that were terminated + because connection to the client was lost + + + + + + sessions_fatal bigint + + + Number of database sessions to this database that were terminated + by fatal errors + + + + + + sessions_killed bigint + + + Number of database sessions to this database that were terminated + by operator intervention + + + stats_reset timestamp with time zone diff --git a
Re: [HACKERS] logical decoding of two-phase transactions
On Mon, Dec 14, 2020 at 2:59 PM Amit Kapila wrote: > Today, I looked at one of the issues discussed earlier in this thread [1] which is that decoding can block (or deadlock can happen) when the user explicitly locks the catalog relation (like Lock pg_class) or perform Cluster on non-relmapped catalog relations (like Cluster pg_trigger using pg_class_oid_index; and the user_table on which we have performed any operation has a trigger) in the prepared xact. As discussed previously, we don't have a problem when user tables are exclusively locked because during decoding we don't acquire any lock on those and in fact, we have a test case for the same in the patch. In the previous discussion, most people seem to be of opinion that we should document it in a category "don't do that", or prohibit to prepare transactions that lock system tables in the exclusive mode as any way that can block the entire system. The other possibility could be that the plugin can allow enabling lock_timeout when it wants to allow decoding of two-phase xacts and if the timeout occurs it tries to fetch by disabling two-phase option provided by the patch. I think it is better to document this as there is no realistic scenario where it can happen. I also think separately (not as part of this patch) we can investigate whether it is a good idea to prohibit prepare for transactions that acquire exclusive locks on catalog relations. Thoughts? [1] - https://www.postgresql.org/message-id/CAMGcDxf83P5SGnGH52%3D_0wRP9pO6uRWCMRwAA0nxKtZvir2_vQ%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Parallel Inserts in CREATE TABLE AS
On Tue, Dec 15, 2020 at 5:48 PM Hou, Zhijie wrote: > > A little explanation about how to push down the ctas info in append. > > > > 1. about how to ignore tuple cost in this case. > > IMO, it create gather path under append like the following: > > query_planner > > -make_one_rel > > --set_base_rel_sizes > > ---set_rel_size > > set_append_rel_size (*) > > -set_rel_size > > --set_subquery_pathlist > > ---subquery_planner > > grouping_planner > > -apply_scanjoin_target_to_paths > > --generate_useful_gather_paths > > > > set_append_rel_size seems the right place where we can check and set a flag > > to ignore tuple cost later. > > We can set the flag for two cases when there is no parent path will be > > created(such as : limit,sort,distinct...): > > i) query_level is 1 > > ii) query_level > 1 and we have set the flag in the parent_root. > > > > The case ii) is to check append under append: > > Append > >->Append > >->Gather > >->Other plan > > > > 2.about how to push ctas info down. > > > > We traversing the whole plans tree, and we only care Append and Gather type. > > Gather: It set the ctas dest info and returned true at once if the > > gathernode > > does not have projection. > > Append: It will recursively traversing the subplan of Appendnode and will > > reture true if one of the subplan can be parallel. > > > > +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) { > > + bool parallel = false; > > + > > + if(ps == NULL) > > + return parallel; > > + > > + if(IsA(ps, AppendState)) > > + { > > + AppendState *aps = (AppendState *) ps; > > + for(int i = 0; i < aps->as_nplans; i++) > > + { > > + parallel |= > > PushDownCTASParallelInsertState(dest, aps->appendplans[i]); > > + } > > + } > > + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) > > + { > > + GatherState *gstate = (GatherState *) ps; > > + parallel = true; > > + > > + ((DR_intorel *) dest)->is_parallel = true; > > + gstate->dest = dest; > > + ps->plan->plan_rows = 0; > > + } > > + > > + return parallel; > > +} > > So sorry for my miss, my last patch has some mistakes. > Attatch the new one. Thanks for the append patches. Basically your changes look good to me. I'm merging them to the original patch set and adding the test cases to cover these cases. I will post the updated patch set soon. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: Parallel Inserts in CREATE TABLE AS
> From: Hou, Zhijie [mailto:houzj.f...@cn.fujitsu.com] > Sent: Tuesday, December 15, 2020 7:30 PM > To: Bharath Rupireddy > Cc: Amit Kapila ; Luc Vlaming ; > PostgreSQL-development ; Zhihong Yu > ; Dilip Kumar > Subject: RE: Parallel Inserts in CREATE TABLE AS > > > Thanks for the append use case. > > > > Here's my analysis on pushing parallel inserts down even in case the > > top node is Append. > > > > For union cases which need to remove duplicate tuples, we can't push > > the inserts or CTAS dest receiver down. If I'm not wrong, Append node > > is not doing duplicate removal(??), I saw that it's the HashAggregate > > node (which is the top node that removes the duplicate tuples). And > > also for except/except all/intersect/intersect all cases we receive > > HashSetOp nodes on top of Append. So for both cases, our check for > > Gather or Append at the top node is enough to detect this to not allow > parallel inserts. > > > > For union all: > > case 1: We can push the CTAS dest receiver to each Gather node Append > > ->Gather > > ->Parallel Seq Scan > > ->Gather > > ->Parallel Seq Scan > > ->Gather > > ->Parallel Seq Scan > > > > case 2: We can still push the CTAS dest receiver to each Gather node. > > Non-Gather nodes will do inserts as they do now i.e. by sending tuples > > to Append and from there to CTAS dest receiver. > > Append > > ->Gather > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > ->Gather > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > > > case 3: We can push the CTAS dest receiver to Gather Gather > > ->Parallel Append > > ->Parallel Seq Scan > > ->Parallel Seq Scan > > > > case 4: We can push the CTAS dest receiver to Gather Gather > > ->Parallel Append > > ->Parallel Seq Scan > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > > > Please let me know if I'm missing any other possible use case. > > > > Thoughts? > > > Yes, The analysis looks right to me. > > > > As suggested by Amit earlier, I kept the 0001 patch(so far) such that > > it doesn't have the code to influence the planner to consider parallel > > tuple cost as 0. It works on the plan whatever gets generated and > > decides to allow parallel inserts or not. And in the 0002 patch, I > > added the code for influencing the planner to consider parallel tuple > > cost as 0. Maybe we can have a 0003 patch for tests alone. > > > > Once we are okay with the above analysis and use cases, we can > > incorporate the Append changes to respective patches. > > > > Hope that's okay. > > A little explanation about how to push down the ctas info in append. > > 1. about how to ignore tuple cost in this case. > IMO, it create gather path under append like the following: > query_planner > -make_one_rel > --set_base_rel_sizes > ---set_rel_size > set_append_rel_size (*) > -set_rel_size > --set_subquery_pathlist > ---subquery_planner > grouping_planner > -apply_scanjoin_target_to_paths > --generate_useful_gather_paths > > set_append_rel_size seems the right place where we can check and set a flag > to ignore tuple cost later. > We can set the flag for two cases when there is no parent path will be > created(such as : limit,sort,distinct...): > i) query_level is 1 > ii) query_level > 1 and we have set the flag in the parent_root. > > The case ii) is to check append under append: > Append >->Append >->Gather >->Other plan > > 2.about how to push ctas info down. > > We traversing the whole plans tree, and we only care Append and Gather type. > Gather: It set the ctas dest info and returned true at once if the gathernode > does not have projection. > Append: It will recursively traversing the subplan of Appendnode and will > reture true if one of the subplan can be parallel. > > +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) { > + bool parallel = false; > + > + if(ps == NULL) > + return parallel; > + > + if(IsA(ps, AppendState)) > + { > + AppendState *aps = (AppendState *) ps; > + for(int i = 0; i < aps->as_nplans; i++) > + { > + parallel |= > PushDownCTASParallelInsertState(dest, aps->appendplans[i]); > + } > + } > + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) > + { > + GatherState *gstate = (GatherState *) ps; > + parallel = true; > + > + ((DR_intorel *) dest)->is_parallel = true; > + gstate->dest = dest; > + ps->plan->plan_rows = 0; > + } > + > + return parallel; > +} So sorry for my miss, my last patch has some mistakes. Attatch the new one. Best regards, houzj 0001-support-pctas-in-append-parallel-inserts.patch Description: 0001-support-pctas-in-append-
RE: libpq debug log
Hi Alvaro san, > There are some things still to do: I worked on some to do. > 1. Is the handling of protocol 2 correct? Protocol 3.0 is implemented in PostgreSQL v7.4 or later. Therefore, most servers and clients today want to connect using 3.0. Also, wishful thinking, I think Protocol 2.0 will no longer be supported. So I didn't develop it aggressively. Another reason I'm concerned about implementing it would make the code less maintainable. > 5. Error messages are still printing the terminating zero byte. I > suggest that it should be suppressed. I suppressed to print terminating zero byte like this; 2020-12-15 00:54:09 UTC < ErrorResponse 100 S "ERROR" V "ERROR" C "42P01" M "relation "a" does not exist" P "15" F "parse_relation.c" L "1379" R "parserOpenTable" 0 I thought about not outputting it, but the document said that if zero, it was the last message, so I made it output "0". > 6. Let's redefine pqTraceMaybeBreakLine() (I renamed from > pqLogLineBreak): Sure. I redefined this function. > 7. Why does it make sense to call pqTraceMaybeBreakLine when > commsource=FROM_FRONTEND? Backend messages include break line. However frontend messages don’t have it. So I call this functions. Pending lists. > 0. XXX comments There were 4 XXX comments in Alvaro san's patch. 3 XXX comments are left in current patch. I will answer this in next e-mail. > 2. We need a mode to suppress print of time; > 3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good. > 4. Even in text format, COPY output is not very useful. How can we improve that? Regards, Aya Iwata Fujitsu v10-0001-libpq-trace.patch Description: v10-0001-libpq-trace.patch
RE: Parallel Inserts in CREATE TABLE AS
> Thanks for the append use case. > > Here's my analysis on pushing parallel inserts down even in case the top > node is Append. > > For union cases which need to remove duplicate tuples, we can't push the > inserts or CTAS dest receiver down. If I'm not wrong, Append node is not > doing duplicate removal(??), I saw that it's the HashAggregate node (which > is the top node that removes the duplicate tuples). And also for > except/except all/intersect/intersect all cases we receive HashSetOp nodes > on top of Append. So for both cases, our check for Gather or Append at the > top node is enough to detect this to not allow parallel inserts. > > For union all: > case 1: We can push the CTAS dest receiver to each Gather node Append > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > > case 2: We can still push the CTAS dest receiver to each Gather node. > Non-Gather nodes will do inserts as they do now i.e. by sending tuples to > Append and from there to CTAS dest receiver. > Append > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > case 3: We can push the CTAS dest receiver to Gather Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > > case 4: We can push the CTAS dest receiver to Gather Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > Please let me know if I'm missing any other possible use case. > > Thoughts? Yes, The analysis looks right to me. > As suggested by Amit earlier, I kept the 0001 patch(so far) such that it > doesn't have the code to influence the planner to consider parallel tuple > cost as 0. It works on the plan whatever gets generated and decides to allow > parallel inserts or not. And in the 0002 patch, I added the code for > influencing the planner to consider parallel tuple cost as 0. Maybe we can > have a 0003 patch for tests alone. > > Once we are okay with the above analysis and use cases, we can incorporate > the Append changes to respective patches. > > Hope that's okay. A little explanation about how to push down the ctas info in append. 1. about how to ignore tuple cost in this case. IMO, it create gather path under append like the following: query_planner -make_one_rel --set_base_rel_sizes ---set_rel_size set_append_rel_size (*) -set_rel_size --set_subquery_pathlist ---subquery_planner grouping_planner -apply_scanjoin_target_to_paths --generate_useful_gather_paths set_append_rel_size seems the right place where we can check and set a flag to ignore tuple cost later. We can set the flag for two cases when there is no parent path will be created(such as : limit,sort,distinct...): i) query_level is 1 ii) query_level > 1 and we have set the flag in the parent_root. The case ii) is to check append under append: Append ->Append ->Gather ->Other plan 2.about how to push ctas info down. We traversing the whole plans tree, and we only care Append and Gather type. Gather: It set the ctas dest info and returned true at once if the gathernode does not have projection. Append: It will recursively traversing the subplan of Appendnode and will reture true if one of the subplan can be parallel. +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) +{ + bool parallel = false; + + if(ps == NULL) + return parallel; + + if(IsA(ps, AppendState)) + { + AppendState *aps = (AppendState *) ps; + for(int i = 0; i < aps->as_nplans; i++) + { + parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]); + } + } + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + parallel = true; + + ((DR_intorel *) dest)->is_parallel = true; + gstate->dest = dest; + ps->plan->plan_rows = 0; + } + + return parallel; +} Best regards, houzj 0001-support-pctas-in-append-parallel-inserts.patch Description: 0001-support-pctas-in-append-parallel-inserts.patch 0002-support-pctas-in-append-tuple-cost-adjustment.patch Description: 0002-support-pctas-in-append-tuple-cost-adjustment.patch
Re: REINDEX backend filtering
On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier wrote: > > On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote: > > Now that we have the infrastructure to track indexes that might be corrupted > > due to changes in collation libraries, I think it would be a good idea to > > offer > > an easy way for users to reindex all indexes that might be corrupted. > > Yes. It would be a good thing. > > > The filter is also implemented so that you could cumulate multiple filters, > > so > > it could be easy to add more filtering, for instance: > > > > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb; > > > > to only rebuild indexes depending on outdated libc collations, or > > > > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb; > > > > to only rebuild indexes depending on a specific version of libc. > > Deciding on the grammar to use depends on the use cases we would like > to satisfy. From what I heard on this topic, the goal is to reduce > the amount of time necessary to reindex a system so as REINDEX only > works on indexes whose dependent collation versions are not known or > works on indexes in need of a collation refresh (like a reindexdb > --all --collation -j $jobs). What would be the benefit in having more > complexity with library-dependent settings while we could take care > of the use cases that matter the most with a simple grammar? Perhaps > "not_current" is not the best match as a keyword, we could just use > "collation" and handle that as a boolean. As long as we don't need > new operators in the grammar rules.. I'm not sure what the DBA usual pattern here. If the reindexing runtime is really critical, I'm assuming that at least some people will dig into library details to see what are the collations that actually broke in the last upgrade and will want to reindex only those, and force the version for the rest of the indexes. And obviously, they probably won't wait to have multiple collation versions dependencies before taking care of that. In that case the filters that would matters would be one to only keep indexes with an outdated collation version, and an additional one for a specific collation name. Or we could have the COLLATION keyword without additional argument mean all outdated collations, and COLLATION 'collation_name' to specify a specific one. This is maybe a bit ugly, and would probably require a different approach for reindexdb.
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2020-12-15 03:14, Justin Pryzby wrote: On Sat, Dec 12, 2020 at 01:45:26PM -0600, Justin Pryzby wrote: On Sat, Dec 12, 2020 at 09:20:35AM +0100, Peter Eisentraut wrote: > On 2020-12-11 21:27, Alvaro Herrera wrote: > > By the way-- What did you think of the idea of explictly marking the > > types used for bitmasks using types bits32 and friends, instead of plain > > int, which is harder to spot? > > If we want to make it clearer, why not turn the thing into a struct, as in > the attached patch, and avoid the bit fiddling altogether. I like this. It's a lot like what I wrote as [PATCH v31 1/5] ExecReindex and ReindexParams In my v31 patch, I moved ReindexOptions to a private structure in indexcmds.c, with an "int options" bitmask which is passed to reindex_index() et al. Your patch keeps/puts ReindexOptions index.h, so it also applies to reindex_index, which I think is good. So I've rebased this branch on your patch. Some thoughts: - what about removing the REINDEXOPT_* prefix ? - You created local vars with initialization like "={}". But I thought it's needed to include at least one struct member like "={false}", or else they're not guaranteed to be zerod ? - You passed the structure across function calls. The usual convention is to pass a pointer. I think maybe Michael missed this message (?) I had applied some changes on top of Peter's patch. I squished those commits now, and also handled ClusterOption and VacuumOption in the same style. Some more thoughts: - should the structures be named in plural ? "ReindexOptions" etc. Since they define *all* the options, not just a single bit. - For vacuum, do we even need a separate structure, or should the members be directly within VacuumParams ? It's a bit odd to write params.options.verbose. Especially since there's also ternary options which are directly within params. This is exactly what I have thought after looking on Peter's patch. Writing 'params.options.some_option' looks just too verbose. I even started moving all vacuum options into VacuumParams on my own and was in the middle of the process when realized that there are some places that do not fit well, like: if (!vacuum_is_relation_owner(RelationGetRelid(onerel), onerel->rd_rel, params->options & VACOPT_ANALYZE)) Here we do not want to set option permanently, but rather to trigger some additional code path in the vacuum_is_relation_owner(), IIUC. With unified VacuumParams we should do: bool opt_analyze = params->analyze; ... params->analyze = true; if (!vacuum_is_relation_owner(RelationGetRelid(onerel), onerel->rd_rel, params)) ... params->analyze = opt_analyze; to achieve the same, but it does not look good to me, or change the whole interface. I have found at least one other place like that so far --- vacuum_open_relation() in the analyze_rel(). Not sure how we could better cope with such logic. - Then, for cluster, I think it should be called ClusterParams, and eventually include the tablespaceOid, like what we're doing for Reindex. I am awaiting feedback on these before going further since I've done too much rebasing with these ideas going back and forth and back. Yes, we have moved a bit from my original patch set in the thread with all this refactoring. However, all the consequent patches are heavily depend on this interface, so let us decide first on the proposed refactoring. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Re: archive status ".ready" files may be created too early
At Mon, 14 Dec 2020 18:25:23 +, "Bossart, Nathan" wrote in > Apologies for the long delay. > > I've spent a good amount of time thinking about this bug and trying > out a few different approaches for fixing it. I've attached a work- > in-progress patch for my latest attempt. > > On 10/13/20, 5:07 PM, "Kyotaro Horiguchi" wrote: > > F0F1 > > A F B > > |-|-|-| > >seg Xseg X+1 seg X+2 > > > > Matsumura-san has a concern about the case where there are two (or > > more) partially-flushed segment-spanning records at the same time. > > > > This patch remembers only the last cross-segment record. If we were > > going to flush up to F0 after Record-B had been written, we would fail > > to hold-off archiving seg-X. This patch is based on a assumption that > > that case cannot happen because we don't leave a pending page at the > > time of segment switch and no records don't span over three or more > > segments. > > I wonder if these are safe assumptions to make. For your example, if > we've written record B to the WAL buffers, but neither record A nor B > have been written to disk or flushed, aren't we still in trouble? You're right in that regard. There's a window where partial record is written when write location passes F0 after insertion location passes F1. However, remembering all spanning records seems overkilling to me. I modifed the previous patch so that it remembers the start LSN of the *oldest* corss-segment continuation record in the last consecutive bonded segments, and the end LSN of the latest cross-segmetn continuation record. This doesn't foreget past segment boundaries. The region is cleard when WAL-write LSN goes beyond the remembered end LSN. So the region may contain several wal-segments that are not connected to the next one, but that doesn't matter so much. > Also, is there actually any limit on WAL record length that means that > it is impossible for a record to span over three or more segments? Even though it is not a hard limit, AFAICS as mentioned before the longest possible record is what log_newpages() emits. that is up to about 500kBytes for now. I think we don't want to make the length longer. If we set the wal_segment_size to 1MB and set the block size to 16kB or more, we would have a recrod spanning over three or more segments but I don't think that is a sane configuration and that kind of issue could happen elsewhere. > Perhaps these assumptions are true, but it doesn't seem obvious to me > that they are, and they might be pretty fragile. I added an assertion that a record must be shorter than a wal segment to XLogRecordAssemble(). This guarantees the assumption to be true? (The condition is tentative, would need to be adjusted.) > The attached patch doesn't make use of these assumptions. Instead, we > track the positions of the records that cross segment boundaries in a > small hash map, and we use that to determine when it is safe to mark a > segment as ready for archival. I think this approach resembles > Matsumura-san's patch from June. > > As before, I'm not handling replication, archive_timeout, and > persisting latest-marked-ready through crashes yet. For persisting > the latest-marked-ready segment through crashes, I was thinking of > using a new file that stores the segment number. Also, the attached is a PoC. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5714bd064d61135c41a64ecc39aeff74c25a0e74 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 15 Dec 2020 16:24:13 +0900 Subject: [PATCH v3] PoC: Avoid archiving a WAL segment that continues to the next segment If the last record of a finshed segment continues to the next segment, we need to defer archiving of the segment until the record is flushed to the end. Otherwise crash recovery can overwrite the last record of a segment and history diverges between archive and pg_wal. --- src/backend/access/transam/xlog.c | 187 +++- src/backend/access/transam/xloginsert.c | 3 + src/backend/replication/walsender.c | 14 +- src/include/access/xlog.h | 1 + 4 files changed, 193 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8dd225c2e1..98da521601 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -723,6 +723,16 @@ typedef struct XLogCtlData */ XLogRecPtr lastFpwDisableRecPtr; + /* The last segment notified to be archived. Protected by WALWriteLock */ + XLogSegNo lastNotifiedSeg; + + /* + * Remember the oldest and newest known segment that ends with a + * continuation record. + */ + XLogRecPtr firstSegContRecStart; + XLogRecPtr lastSegContRecEnd; + slock_t info_lck; /* locks shared variables shown above */ } XLogCtlData; @@ -1160,6 +1170,9 @@ XLogInsertRecord(XLogRecData *rdata, */ if (StartPos / XLOG_BLCKSZ != EndPos / X
Re: Single transaction in the tablesync worker?
On Thu, Dec 10, 2020 at 8:49 PM Peter Smith wrote: > So I will try to write a patch for the proposed Solution-1. > Hi Amit. FYI, here is my v3 WIP patch for the Solution1. This patch applies onto the v30 patch set [1] from the other 2PC thread: [1] https://www.postgresql.org/message-id/CAFPTHDYA8yE6tEmQ2USYS68kNt%2BkM%3DSwKgj%3Djy4AvFD5e9-UTQ%40mail.gmail.com Although incomplete, it does continue to pass all the make check, and src/test/subscription TAP tests. Coded / WIP: * tablesync slot is now permanent instead of temporary * the tablesync slot cleanup (drop) code is added for DropSubscription and for finish_sync_worker functions * tablesync worked now allowing multiple tx instead of single tx * a new state (SUBREL_STATE_COPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * if a relaunched tablesync finds the state is SUBREL_STATE_COPYDONE then it will bypass the initial copy_table phase. TODO / Known Issues: * The tablesync replication origin/lsn logic all needs to be updated so that tablesync knows where to restart based on information held by the now permanent slot. * the current implementation of tablesync drop slot (e.g. from DROP SUBSCRIPTION) or finish_sync_worker regenerates the tablesync slot name so it knows what slot to drop. The current code may be ok for normal use cases, but if there is an ALTER SUBSCRIPTION ... SET (slot_name = newname) it would fail to be able to find the tablesync slot. Some redesign may be needed for this part. * help / comments / cleanup * There is temporary "!!>>" excessive logging of mine scattered around which I added to help my testing during development --- Kind Regards, Peter Smith. Fujitsu Australia v3-0002-2PC-Solution1-WIP-20201215.patch Description: Binary data v3-0001-2PC-change-tablesync-slot-to-use-same-two_phase-m.patch Description: Binary data
Re: Minor documentation error regarding streaming replication protocol
On Thu, 2020-12-03 at 13:14 -0500, Bruce Momjian wrote: > Andres objected (in a separate conversation) to forcing a binary- > > format > > value on a client that didn't ask for one. He suggested that we > > mandate > > that the data is ASCII-only (for both filename and content), > > closing > > the gap Michael raised[1]; and then just declare all values to be > > text > > format. > > How do we mandate that? Just mention it in the docs and C comments? Attached a simple patch that enforces an all-ASCII restore point name rather than documenting the possibility of non-ASCII text. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/3679218.1602770626%40sss.pgh.pa.us From b9381a62ab9616da3062827f84b72844748fd3fc Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 15 Dec 2020 01:06:21 -0800 Subject: [PATCH] Enforce ASCII restore point names. The TIMELINE_HISTORY replication command assumes that the file name and contents are valid ASCII, but the timeline history file may contain restore point names. Discussion: https://postgr.es/m/20201203181456.gk20...@momjian.us --- doc/src/sgml/protocol.sgml | 4 +--- src/backend/access/transam/xlogfuncs.c | 27 ++ 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 4899bacda7..b18a4bceac 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1861,9 +1861,7 @@ The commands accepted in replication mode are: Requests the server to send over the timeline history file for timeline tli. Server replies with a - result set of a single row, containing two fields. While the fields - are labeled as text, they effectively return raw bytes, - with no encoding conversion: + result set of a single row, containing two fields: diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 290658b22c..48daed56f6 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -44,6 +44,8 @@ static StringInfo label_file; static StringInfo tblspc_map_file; +static bool is_all_ascii(const char *str); + /* * pg_start_backup: set up for taking an on-line backup dump * @@ -309,6 +311,16 @@ pg_create_restore_point(PG_FUNCTION_ARGS) restore_name_str = text_to_cstring(restore_name); + /* + * The restore point name may be included in a timeline history file. The + * system assumes that the contents are ASCII, so enforce an all-ASCII + * restore point name. + */ + if (!is_all_ascii(restore_name_str)) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("restore point name must be valid ASCII"))); + if (strlen(restore_name_str) >= MAXFNAMELEN) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -784,3 +796,18 @@ pg_promote(PG_FUNCTION_ARGS) (errmsg("server did not promote within %d seconds", wait_seconds))); PG_RETURN_BOOL(false); } + +/* + * Check a string to see if it is pure ASCII + */ +static bool +is_all_ascii(const char *str) +{ + while (*str) + { + if (IS_HIGHBIT_SET(*str)) + return false; + str++; + } + return true; +} -- 2.17.1
Re: Parallel Inserts in CREATE TABLE AS
On Tue, Dec 15, 2020 at 2:06 PM Bharath Rupireddy wrote: > > On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie wrote: > > Currently with the patch, we can allow parallel CTAS when topnode is Gather. > > When top-node is Append and Gather is the sub-node of Append, I think we > > can still enable > > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as: > > > > Append > > -->Gather > > ->Create table > > ->Seqscan > > -->Gather > > ->create table > > ->Seqscan > > > > And the use case seems common to me, such as: > > select * from A where xxx union all select * from B where xxx; > > Thanks for the append use case. > > Here's my analysis on pushing parallel inserts down even in case the > top node is Append. > > For union cases which need to remove duplicate tuples, we can't push > the inserts or CTAS dest receiver down. If I'm not wrong, Append node > is not doing duplicate removal(??), I saw that it's the HashAggregate > node (which is the top node that removes the duplicate tuples). And > also for except/except all/intersect/intersect all cases we receive > HashSetOp nodes on top of Append. So for both cases, our check for > Gather or Append at the top node is enough to detect this to not allow > parallel inserts. > > For union all: > case 1: We can push the CTAS dest receiver to each Gather node > Append > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > > case 2: We can still push the CTAS dest receiver to each Gather node. > Non-Gather nodes will do inserts as they do now i.e. by sending tuples > to Append and from there to CTAS dest receiver. > Append > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > case 3: We can push the CTAS dest receiver to Gather > Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > > case 4: We can push the CTAS dest receiver to Gather > Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > Please let me know if I'm missing any other possible use case. > > Thoughts? Your analysis looks right to me. > > I attach a WIP patch which just show the possibility of this feature. > > The patch is based on the latest v11-patch. > > > > What do you think? > > As suggested by Amit earlier, I kept the 0001 patch(so far) such that > it doesn't have the code to influence the planner to consider parallel > tuple cost as 0. It works on the plan whatever gets generated and > decides to allow parallel inserts or not. And in the 0002 patch, I > added the code for influencing the planner to consider parallel tuple > cost as 0. Maybe we can have a 0003 patch for tests alone. Yeah, that makes sense and it will be easy for the review. > Once we are okay with the above analysis and use cases, we can > incorporate the Append changes to respective patches. > > Hope that's okay. Make sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: libpq debug log
On Tue, Oct 27, 2020 at 3:23 AM Alvaro Herrera wrote: > > I've been hacking at this patch again. There were a few things I wasn't > too clear about, so I reordered the code and renamed the routines to try > to make it easier to follow. > Hi, Hopefully Iwata-san will return to looking at this soon. I have tried the latest patch a little (using "psql" as my client), and have a few small comments so far: - In pqTraceInit(), in the (admittedly rare) case that fe_msg malloc() fails, I'd NULL out be_msg too after free(), rather than leave it dangling (because if pgTraceInit() was ever invoked again, as the comment says it could, it would result in previously freed memory being accessed ...) conn->fe_msg = malloc(MAX_FRONTEND_MSGS * sizeof(pqFrontendMessage)); if (conn->fe_msg == NULL) { free(conn->be_msg); conn->be_msg = NULL; return false; } - >3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good. That seemed to happen for me only if COPYing binary format to stdout. UnknownCommand :::Invalid Protocol - >5. Error messages are still printing the terminating zero byte. I >suggest that it should be suppressed. Perhaps there's a more elegant way of doing it, but I got rid of the logging of the zero byte using the following change to pgLogMsgByte1(), though there still seems to be a trailing space issue: case LOG_CONTENTS: - fprintf(conn->Pfdebug, "%c ", v); + if (v != '\0') + fprintf(conn->Pfdebug, "%c ", v); pqTraceMaybeBreakLine(sizeof(v), conn); break; Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel Inserts in CREATE TABLE AS
On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie wrote: > Currently with the patch, we can allow parallel CTAS when topnode is Gather. > When top-node is Append and Gather is the sub-node of Append, I think we can > still enable > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as: > > Append > -->Gather > ->Create table > ->Seqscan > -->Gather > ->create table > ->Seqscan > > And the use case seems common to me, such as: > select * from A where xxx union all select * from B where xxx; Thanks for the append use case. Here's my analysis on pushing parallel inserts down even in case the top node is Append. For union cases which need to remove duplicate tuples, we can't push the inserts or CTAS dest receiver down. If I'm not wrong, Append node is not doing duplicate removal(??), I saw that it's the HashAggregate node (which is the top node that removes the duplicate tuples). And also for except/except all/intersect/intersect all cases we receive HashSetOp nodes on top of Append. So for both cases, our check for Gather or Append at the top node is enough to detect this to not allow parallel inserts. For union all: case 1: We can push the CTAS dest receiver to each Gather node Append ->Gather ->Parallel Seq Scan ->Gather ->Parallel Seq Scan ->Gather ->Parallel Seq Scan case 2: We can still push the CTAS dest receiver to each Gather node. Non-Gather nodes will do inserts as they do now i.e. by sending tuples to Append and from there to CTAS dest receiver. Append ->Gather ->Parallel Seq Scan ->Seq Scan / Join / any other non-Gather node ->Gather ->Parallel Seq Scan ->Seq Scan / Join / any other non-Gather node case 3: We can push the CTAS dest receiver to Gather Gather ->Parallel Append ->Parallel Seq Scan ->Parallel Seq Scan case 4: We can push the CTAS dest receiver to Gather Gather ->Parallel Append ->Parallel Seq Scan ->Parallel Seq Scan ->Seq Scan / Join / any other non-Gather node Please let me know if I'm missing any other possible use case. Thoughts? > I attach a WIP patch which just show the possibility of this feature. > The patch is based on the latest v11-patch. > > What do you think? As suggested by Amit earlier, I kept the 0001 patch(so far) such that it doesn't have the code to influence the planner to consider parallel tuple cost as 0. It works on the plan whatever gets generated and decides to allow parallel inserts or not. And in the 0002 patch, I added the code for influencing the planner to consider parallel tuple cost as 0. Maybe we can have a 0003 patch for tests alone. Once we are okay with the above analysis and use cases, we can incorporate the Append changes to respective patches. Hope that's okay. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: TAP tests aren't using the magic words for Windows file access
Hello hackers, Are there any plans to backport the patch to earlier versions of the Postgres? https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=114541d58e5970e51b78b77b65de16210beaab43 We rarely see the issue with the pg_ctl/004_logrotate test on the REL_12_STABLE branch. On my notebook I can easily reproduce the "Permission denied at src/test/perl/TestLib.pm line 259" error with the small change below. But the same test on the 13th version and the 12th version with the TestLib patch does not fail. diff --git a/src/bin/pg_ctl/t/004_logrotate.pl b/src/bin/pg_ctl/t/004_logrotate.pl index bc39abd23e4..e49e159bc84 100644 --- a/src/bin/pg_ctl/t/004_logrotate.pl +++ b/src/bin/pg_ctl/t/004_logrotate.pl @@ -72,7 +72,7 @@ for (my $attempts = 0; $attempts < $max_attempts; $attempts++) { $new_current_logfiles = slurp_file($node->data_dir . '/current_logfiles'); last if $new_current_logfiles ne $current_logfiles; - usleep(100_000); + usleep(1); } note "now current_logfiles = $new_current_logfiles"; On 2019-11-22 20:22, Andrew Dunstan wrote: On 11/22/19 3:55 AM, Juan José Santamaría Flecha wrote: On Fri, Nov 22, 2019 at 9:00 AM Michael Paquier mailto:mich...@paquier.xyz>> wrote: On Thu, Nov 21, 2019 at 08:09:38PM +0100, Juan José Santamaría Flecha wrote: > I think Perl's open() is a bad candidate for an overload, so I will update > the previous patch that only touches slurp_file(). FWIW, I don't like much the approach of patching only slurp_file(). What gives us the guarantee that we won't have this discussion again in a couple of months or years once a new caller of open() is added for some new TAP tests, and that it has the same problems with multi-process concurrency? I agree on that, from a technical stand point, overloading open() is probably the best solution for the reasons above mentioned. My doubts come from the effort such a solution will take and its maintainability, also taking into account that there are not that many calls to open() in "src/test/perl". I think the best course is for us to give your latest patch an outing on the buildfarm and verify that the issues seen with slurp_file disappear. That shouldn't take us more than a week or two to see - drongo has had 6 such failures in the last 11 days on master. After that we can discuss how much further we might want to take it. cheers andrew -- regards, Roman Zharkov
Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
On Mon, Dec 14, 2020 at 01:59:03PM -0500, Tom Lane wrote: > * Should we just have a blanket insistence that all callers supply > HASH_ELEM? The default sizes that dynahash.c uses without that are > undocumented and basically useless. +1 > we should just rip out all those memsets as pointless, since there's > basically no case where you'd use the memset to fill a field that > you meant to pass as zero. The fact that hash_create() doesn't > read fields it's not told to by a flag means we should not need > the memsets to avoid uninitialized-memory reads. On Mon, Dec 14, 2020 at 06:55:20PM -0500, Tom Lane wrote: > Here's a rolled-up patch that does some further documentation work > and gets rid of the unnecessary memset's as well. +1 on removing the memset() calls. That said, it's not a big deal if more creep in over time; it doesn't qualify as a project policy violation. > @@ -329,6 +328,11 @@ InitShmemIndex(void) > * whose maximum size is certain, this should be equal to max_size; that > * ensures that no run-time out-of-shared-memory failures can occur. > * > + * *infoP and hash_flags should specify at least the entry sizes and key s/should/must/