Re: pg_upgrade fails with in-place tablespace[
On Tue, Sep 05, 2023 at 10:28:44AM +0800, Rui Zhao wrote: > It would be highly appreciated if the official PG could also > incorporate support for this feature. This is a development option, and documented as such. You may also want to be aware of the fact that tablespaces in the data folder itself are discouraged, because they don't really make sense, and that we generate a WARNING if trying to do that since 33cb8ff6aa11. I don't have really have more to add, but others interested in this topic are of course free to share their opinions and/or comments. -- Michael signature.asc Description: PGP signature
RE: [PoC] pg_upgrade: allow to upgrade publisher node
On Friday, September 1, 2023 9:05 PM Kuroda, Hayato/黒田 隼人 wrote: > Hi, Thanks for updating the patch. I have a comment about the check related to the wal_status. Currently, there are few places where we check the wal_status of slots. e.g. check_old_cluster_for_valid_slots(),get_loadable_libraries(), and get_old_cluster_logical_slot_infos(). But as discussed in another thread[1]. There are some kind of WALs that will be written when pg_upgrade are checking the old cluster which could cause the wal size to exceed the max_slot_wal_keep_size. In this case, checkpoint will remove the wals required by slots and invalidate these slots(the wal_status get changed as well). Based on this, it’s possible that the slots we get each time when checking wal_status are different, because they may get changed in between these checks. This may not cause serious problems for now, because we will either copy all the slots including ones invalidated when upgrading or we report ERROR. But I feel it's better to get consistent result each time we check the slots to close the possibility for problems in the future. So, I feel we could centralize the check for wal_status and slots fetch, so that even if some slots status changed after that, it won't have a risk to affect our check. What do you think ? [1] https://www.postgresql.org/message-id/flat/CAA4eK1LLik2818uzYqS73O%2BHe5LK_%2B%3DkthyZ6hwT6oe9TuxycA%40mail.gmail.com#16efea0a76d623b1335e73fc1e28f5ef Best Regards, Hou zj
Re: Autogenerate some wait events code and documentation
On Mon, Sep 04, 2023 at 02:14:58PM +0200, Drouvot, Bertrand wrote: > # Build the descriptions. There are in camel-case. > # LWLock and Lock classes do not need any modifications. > > Nit: 2 whitespaces before "There are in camel" The whitespaces are intentional, the typo in the first line is not. > + my $waiteventdescription = ''; > + if ( $waitclassname eq 'WaitEventLWLock' > > Nit: Too many whitespace after the "if (" ?? (I guess pgperltidy would > fix it). Here, perltidy is indeed complaining, but it is adding a few whitespaces. > Then, the only diff is: > > < Client,WalSenderWaitWal,Waiting for WAL to be flushed in WAL sender process > --- > > Client,WalSenderWaitForWAL,Waiting for WAL to be flushed in WAL sender > > process > > That said, it looks good to me. Ah, good catch. I did not think about cross-checking the data in the new view before and after the patch set. This rename needs to happen in 0001. Please find v5 attached. How does that look? -- Michael From 878ecc7a13ef9086ec3dc0cc70bff1f5d8a22eea Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 5 Sep 2023 14:32:37 +0900 Subject: [PATCH v5 1/2] Rename wait events with more consistent camelcase style This will help in the introduction of more simplifications with the generation of wait event data using wait_event_names.txt. The event names used the same case-insensitive characters, hence applying lower() or upper() to the monitoring queries allows the detection of the same events as before this change. --- src/backend/libpq/pqmq.c | 2 +- src/backend/replication/walsender.c | 2 +- src/backend/storage/ipc/shm_mq.c | 6 +- .../utils/activity/wait_event_names.txt | 90 +-- 4 files changed, 50 insertions(+), 50 deletions(-) diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c index 253181f47a..38b042804c 100644 --- a/src/backend/libpq/pqmq.c +++ b/src/backend/libpq/pqmq.c @@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len) break; (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_PUT_MESSAGE); + WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); } diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 46e48492ac..e250b0567e 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1654,7 +1654,7 @@ WalSndWaitForWal(XLogRecPtr loc) if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITEABLE; - WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL); + WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL); } /* reactivate latch so WalSndLoop knows to continue */ diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c index d134a09a19..06d6b73527 100644 --- a/src/backend/storage/ipc/shm_mq.c +++ b/src/backend/storage/ipc/shm_mq.c @@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data, * cheaper than setting one that has been reset. */ (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_SEND); + WAIT_EVENT_MESSAGE_QUEUE_SEND); /* Reset the latch so we don't spin. */ ResetLatch(MyLatch); @@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait, * setting one that has been reset. */ (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_RECEIVE); + WAIT_EVENT_MESSAGE_QUEUE_RECEIVE); /* Reset the latch so we don't spin. */ ResetLatch(MyLatch); @@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, BackgroundWorkerHandle *handle) /* Wait to be signaled. */ (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_INTERNAL); + WAIT_EVENT_MESSAGE_QUEUE_INTERNAL); /* Reset the latch so we don't spin. */ ResetLatch(MyLatch); diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 13774254d2..5287ad5f63 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -45,15 +45,15 @@ Section: ClassName - WaitEventActivity WAIT_EVENT_ARCHIVER_MAIN ArchiverMain "Waiting in main loop of archiver process." -WAIT_EVENT_AUTOVACUUM_MAIN AutoVacuumMain "Waiting in main loop of autovacuum launcher process." -WAIT_EVENT_BGWRITER_HIBERNATE BgWriterHibernate "Waiting in background writer process, hibernating." -WAIT_EVENT_BGWRITER_MAIN BgWriterMain "Waiting in main loop of background writer process." +WAIT_EVENT_AUTOVACUUM_MAIN AutovacuumMain "Waiting in main loop of autovacuum launcher process." +WAIT_EVENT_BGWRITER_HIBERNATE BgwriterHibernate "Waiting in background writer process, hibernating." +WAIT_EVENT_BGWRITER_M
Re: Impact of checkpointer during pg_upgrade
On Tue, Sep 5, 2023 at 10:09 AM Dilip Kumar wrote: > > On Tue, Sep 5, 2023 at 9:38 AM Amit Kapila wrote: > > > > On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar wrote: > > > > > > Said that there is a possibility that some of the slots which got > > > invalidated even on the previous checkpoint might get the same LSN as > > > the slot which got invalidated later if there is no activity between > > > these two checkpoints. So if we go with this approach then there is > > > some risk of migrating some of the slots which were already > > > invalidated even before the shutdown checkpoint. > > > > > > > I think even during the shutdown checkpoint, after writing shutdown > > checkpoint WAL, we can invalidate some slots that in theory are safe > > to migrate/copy because all the WAL for those slots would also have > > been sent. So, those would be similar to what we invalidate during the > > upgrade, no? > > Thats correct > > If so, I think it is better to have the same behavior for > > invalidated slots irrespective of the time it gets invalidated. We can > > either give an error for such slots during the upgrade (which means > > disallow the upgrade) or simply ignore such slots during the upgrade. > > I would prefer ERROR but if we want to ignore such slots, we can > > probably inform the user in some way about ignored slots, so that she > > can later drop corresponding subscritions or recreate such slots and > > do the required sync-up to continue the replication. > > Earlier I was thinking that ERRORing out is better so that the user > can take necessary action for the invalidated slots and then retry > upgrade. But thinking again I could not find what are the advantages > of this because if we error out then also users need to restart the > old cluster again and have to drop the corresponding subscriptions > OTOH if we allow the upgrade by ignoring the slots then also the user > has to take similar actions on the new cluster? So what's the > advantage of erroring out over upgrading? > The advantage is that we avoid inconvenience caused to users because Drop Subscription will be unsuccessful as the corresponding slots are not present. So users first need to disassociate slots for the subscription and then drop the subscription. Also, I am not sure leaving behind some slots doesn't have any other impact, otherwise, why don't we drop such slots from time to time after they are marked invalidated during normal operation? If users really want to leave behind such invalidated slots after upgrade, we can even think of providing some option like "exclude_invalid_logical_slots". -- With Regards, Amit Kapila.
Re: persist logical slots to disk during shutdown checkpoint
On Tue, Sep 5, 2023 at 9:58 AM Amit Kapila wrote: > > On Tue, Sep 5, 2023 at 9:10 AM Dilip Kumar wrote: > > > > The comments don't mention anything about why we are just flushing at > > the shutdown checkpoint. I mean the checkpoint is not that frequent > > and we already perform a lot of I/O during checkpoints so isn't it > > wise to flush during every checkpoint. We may argue that there is no > > extra advantage of that as we are not optimizing for crash recovery > > but OTOH there is no reason for not doing so for other checkpoints or > > we are worried about the concurrency with parallel walsender running > > during non shutdown checkpoint if so then better we explain that as > > well? If it is already discussed in the thread and we have a > > conclusion on this then maybe we can mention this in comments? > > > > The point is that at the time of non-shutdown checkpoints, it is not > clear that there is an extra advantage but we will definitely add > extra I/O for this. Because at other times, we will already be saving > the slot from time to time as the replication makes progress. And, we > also need to flush such slots during shutdown for correctness for some > use cases like upgrades. We can probably add something like: "At other > times, the walsender keeps saving the slot from time to time as the > replication progresses, so there is no clear advantage of flushing > additional slots at the time of checkpoint". Will that work for you? Yeah that comments will work out, my only concern was because we added an explicit condition that it should be synced only during shutdown checkpoint so better comments also explicitly explains the reason. Anyway I am fine with either way whether we sync at the shutdown checkpoint or all the checkpoint. Because I/O for slot sync during checkpoint time should not be a real worry and with that if we can avoid additional code with extra conditions then it's better because such code branches will be frequently hit and I think for testability pov we prefer to add code in common path unless there is some overhead or it is specifically meant for that branch only. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Impact of checkpointer during pg_upgrade
On Tue, Sep 5, 2023 at 9:38 AM Amit Kapila wrote: > > On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar wrote: > > > > Said that there is a possibility that some of the slots which got > > invalidated even on the previous checkpoint might get the same LSN as > > the slot which got invalidated later if there is no activity between > > these two checkpoints. So if we go with this approach then there is > > some risk of migrating some of the slots which were already > > invalidated even before the shutdown checkpoint. > > > > I think even during the shutdown checkpoint, after writing shutdown > checkpoint WAL, we can invalidate some slots that in theory are safe > to migrate/copy because all the WAL for those slots would also have > been sent. So, those would be similar to what we invalidate during the > upgrade, no? Thats correct If so, I think it is better to have the same behavior for > invalidated slots irrespective of the time it gets invalidated. We can > either give an error for such slots during the upgrade (which means > disallow the upgrade) or simply ignore such slots during the upgrade. > I would prefer ERROR but if we want to ignore such slots, we can > probably inform the user in some way about ignored slots, so that she > can later drop corresponding subscritions or recreate such slots and > do the required sync-up to continue the replication. Earlier I was thinking that ERRORing out is better so that the user can take necessary action for the invalidated slots and then retry upgrade. But thinking again I could not find what are the advantages of this because if we error out then also users need to restart the old cluster again and have to drop the corresponding subscriptions OTOH if we allow the upgrade by ignoring the slots then also the user has to take similar actions on the new cluster? So what's the advantage of erroring out over upgrading? I see a clear advantage of upgrading is that the user wants to upgrade and that's successful without reattempting. If we say that if we error out and then there are some option for user to salvage those invalidated slots and he can somehow migrate those slot as well by retrying upgrade then it make sense to error out and let user take some action on old cluster, but if all he has to do is to drop the subscription or recreate the slot in both the cases then letting the upgrade pass is better option at least IMHO. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: persist logical slots to disk during shutdown checkpoint
On Tue, Sep 5, 2023 at 9:10 AM Dilip Kumar wrote: > > On Fri, Sep 1, 2023 at 12:16 PM vignesh C wrote: > > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila wrote: > > > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > > wrote: > > > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila > > > > wrote: > > > > > > > > > > All but one. Normally, the idea of marking dirty is to indicate that > > > > > we will actually write/flush the contents at a later point (except > > > > > when required for correctness) as even indicated in the comments atop > > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use > > > > > that protocol at all the current places including CreateSlotOnDisk(). > > > > > So, we can probably do it here as well. > > > > > > > > yes > > > > > > > > > > I think we should also ensure that slots are not invalidated > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty > > > because we don't allow decoding from such slots, so we shouldn't > > > include those. > > > > Added this check. > > > > Apart from this I have also fixed the following issues that were > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots > > instead of setting it in SaveSlotToPath b) The comments were moved > > from ReplicationSlot and moved to CheckPointReplicationSlots c) Tests > > will be run in autovacuum = off d) updating last_saved_confirmed_flush > > based on cp.slotdata.confirmed_flush rather than > > slot->data.confirmed_flush. > > I have also added the commitfest entry for this at [1]. > > The overall idea looks fine to me > > + > + /* > + * We won't ensure that the slot is persisted after the > + * confirmed_flush LSN is updated as that could lead to frequent > + * writes. However, we need to ensure that we do persist the slots at > + * the time of shutdown whose confirmed_flush LSN is changed since we > + * last saved the slot to disk. This will help in avoiding retreat of > + * the confirmed_flush LSN after restart. > + */ > + if (is_shutdown && SlotIsLogical(s)) > + { > + SpinLockAcquire(&s->mutex); > + if (s->data.invalidated == RS_INVAL_NONE && > + s->data.confirmed_flush != s->last_saved_confirmed_flush) > + s->dirty = true; > + SpinLockRelease(&s->mutex); > + } > > The comments don't mention anything about why we are just flushing at > the shutdown checkpoint. I mean the checkpoint is not that frequent > and we already perform a lot of I/O during checkpoints so isn't it > wise to flush during every checkpoint. We may argue that there is no > extra advantage of that as we are not optimizing for crash recovery > but OTOH there is no reason for not doing so for other checkpoints or > we are worried about the concurrency with parallel walsender running > during non shutdown checkpoint if so then better we explain that as > well? If it is already discussed in the thread and we have a > conclusion on this then maybe we can mention this in comments? > The point is that at the time of non-shutdown checkpoints, it is not clear that there is an extra advantage but we will definitely add extra I/O for this. Because at other times, we will already be saving the slot from time to time as the replication makes progress. And, we also need to flush such slots during shutdown for correctness for some use cases like upgrades. We can probably add something like: "At other times, the walsender keeps saving the slot from time to time as the replication progresses, so there is no clear advantage of flushing additional slots at the time of checkpoint". Will that work for you? Having said that, I am not opposed to doing it for non-shutdown checkpoints if one makes a separate case for it. -- With Regards, Amit Kapila.
Re: Impact of checkpointer during pg_upgrade
On Mon, Sep 4, 2023 at 4:19 PM Dilip Kumar wrote: > > Said that there is a possibility that some of the slots which got > invalidated even on the previous checkpoint might get the same LSN as > the slot which got invalidated later if there is no activity between > these two checkpoints. So if we go with this approach then there is > some risk of migrating some of the slots which were already > invalidated even before the shutdown checkpoint. > I think even during the shutdown checkpoint, after writing shutdown checkpoint WAL, we can invalidate some slots that in theory are safe to migrate/copy because all the WAL for those slots would also have been sent. So, those would be similar to what we invalidate during the upgrade, no? If so, I think it is better to have the same behavior for invalidated slots irrespective of the time it gets invalidated. We can either give an error for such slots during the upgrade (which means disallow the upgrade) or simply ignore such slots during the upgrade. I would prefer ERROR but if we want to ignore such slots, we can probably inform the user in some way about ignored slots, so that she can later drop corresponding subscritions or recreate such slots and do the required sync-up to continue the replication. -- With Regards, Amit Kapila.
Re: proposal: psql: show current user in prompt
po 4. 9. 2023 v 14:24 odesílatel Jelte Fennema napsal: > On Sun, 3 Sept 2023 at 20:58, Pavel Stehule > wrote: > > here is an try > > Overall it does what I had in mind. Below a few suggestions: > > +int > +PQprotocolSubversion(const PGconn *conn) > > Ugh, it's quite annoying that the original PQprotocolVersion only > returns the major version and thus we need this new function. It > seems like it would be much nicer if it returned a number similar to > PQserverVersion. I think it might be nicer to change PQprotocolVersion > to do that than to add another function. We could do: > > return PG_PROTOCOL_MAJOR(conn->pversion) * 100 + > PG_PROTOCOL_MINOR(conn->pversion); > > or even: > > if (PG_PROTOCOL_MAJOR(conn->pversion) == 3 && > PG_PROTOCOL_MINOR(conn->pversion)) > return 3; > else > return PG_PROTOCOL_MAJOR(conn->pversion) * 100 + > PG_PROTOCOL_MINOR(conn->pversion); > > The second option would be safest backwards compatibility wise, but in > practice you would only get another value than 3 (or 0) when > connecting to pre 7.4 servers. That seems old enough that I don't > think anyone is actually calling this function. **I'd like some > feedback from others on this though.** > This point is open. I'll wait for a reply from others. > > + /* The protocol 3.0 is required */ > + if (PG_PROTOCOL_MAJOR(their_version) == 3) > + conn->pversion = their_version; > > Let's compare against the actual PG_PROTOCOL_EARLIEST and > PG_PROTOCOL_LATEST to determine if the version is supported or not. > changed From f1975b784627fda06b23303c4f1fb6972d80a0a0 Mon Sep 17 00:00:00 2001 From: "ok...@github.com" Date: Mon, 24 Jul 2023 20:18:16 +0200 Subject: [PATCH 4/4] Implementation of %N prompt placeholder It is based on forcing reporting feature"role" GUC to client. --- doc/src/sgml/ref/psql-ref.sgml | 19 ++- src/bin/psql/command.c | 13 + src/bin/psql/prompt.c | 28 src/bin/psql/settings.h| 1 + src/bin/psql/startup.c | 32 5 files changed, 92 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index d94e3cacfc..8b267a6da6 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4568,7 +4568,24 @@ testdb=> INSERT INTO my_table VALUES (:'content'); The port number at which the database server is listening. - + +%N + + + The database role name. This value is specified by command + SET ROLE. Until execution of this command + the value is set to the database session user name. + + + + This substitution requires PostgreSQL + version 16 and up. When you use older version, the empty string + is used instead. + + + + + %n diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index bcd8eb3538..bad0fdf415 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3883,6 +3883,7 @@ SyncVariables(void) { char vbuf[32]; const char *server_version; + PGresult *result; /* get stuff from connection */ pset.encoding = PQclientEncoding(pset.db); @@ -3912,6 +3913,18 @@ SyncVariables(void) /* send stuff to it, too */ PQsetErrorVerbosity(pset.db, pset.verbosity); PQsetErrorContextVisibility(pset.db, pset.show_context); + + /* link role GUC when it is needed for prompt */ + if (pset.prompt_shows_role) + result = PQlinkParameterStatus(pset.db, "role"); + else + result = PQunlinkParameterStatus(pset.db, "role"); + + if (PQresultStatus(result) != PGRES_COMMAND_OK) + pg_log_info("cannot set REPORT flag on configuration variable \"role\": %s", + PQerrorMessage(pset.db)); + + PQclear(result); } /* diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c index 969cd9908e..d306c91d19 100644 --- a/src/bin/psql/prompt.c +++ b/src/bin/psql/prompt.c @@ -165,6 +165,34 @@ get_prompt(promptStatus_t status, ConditionalStack cstack) if (pset.db) strlcpy(buf, session_username(), sizeof(buf)); break; + /* DB server user role */ +case 'N': + if (pset.db) + { + const char *rolename = NULL; + + /* + * This feature requires GUC "role" to be marked + * by GUC_REPORT flag. This is done by PQlinkParameterStatus + * function. This function requires protocol 3.1 (ReportGUC + * message). Fallback is empty string. + */ + if (PQprotocolVersion(pset.db) == 3 && + PQprotocolSubversion(pset.db) >= 1) + { + rolename = PQparameterStatus(pset.db, "role"); + + /* fallback when role is not set yet */ + if (rolename && strcmp(rolename, "none") == 0) +rolename = session_username(); + } + + if (rolename) + strlcpy(buf, rolename, sizeof(buf)); + else + buf[0] = '\0'; +
Re: persist logical slots to disk during shutdown checkpoint
On Fri, Sep 1, 2023 at 12:16 PM vignesh C wrote: > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila wrote: > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > wrote: > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila > > > wrote: > > > > > > > > All but one. Normally, the idea of marking dirty is to indicate that > > > > we will actually write/flush the contents at a later point (except > > > > when required for correctness) as even indicated in the comments atop > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use > > > > that protocol at all the current places including CreateSlotOnDisk(). > > > > So, we can probably do it here as well. > > > > > > yes > > > > > > > I think we should also ensure that slots are not invalidated > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty > > because we don't allow decoding from such slots, so we shouldn't > > include those. > > Added this check. > > Apart from this I have also fixed the following issues that were > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots > instead of setting it in SaveSlotToPath b) The comments were moved > from ReplicationSlot and moved to CheckPointReplicationSlots c) Tests > will be run in autovacuum = off d) updating last_saved_confirmed_flush > based on cp.slotdata.confirmed_flush rather than > slot->data.confirmed_flush. > I have also added the commitfest entry for this at [1]. The overall idea looks fine to me + + /* + * We won't ensure that the slot is persisted after the + * confirmed_flush LSN is updated as that could lead to frequent + * writes. However, we need to ensure that we do persist the slots at + * the time of shutdown whose confirmed_flush LSN is changed since we + * last saved the slot to disk. This will help in avoiding retreat of + * the confirmed_flush LSN after restart. + */ + if (is_shutdown && SlotIsLogical(s)) + { + SpinLockAcquire(&s->mutex); + if (s->data.invalidated == RS_INVAL_NONE && + s->data.confirmed_flush != s->last_saved_confirmed_flush) + s->dirty = true; + SpinLockRelease(&s->mutex); + } The comments don't mention anything about why we are just flushing at the shutdown checkpoint. I mean the checkpoint is not that frequent and we already perform a lot of I/O during checkpoints so isn't it wise to flush during every checkpoint. We may argue that there is no extra advantage of that as we are not optimizing for crash recovery but OTOH there is no reason for not doing so for other checkpoints or we are worried about the concurrency with parallel walsender running during non shutdown checkpoint if so then better we explain that as well? If it is already discussed in the thread and we have a conclusion on this then maybe we can mention this in comments? /* * Flush all replication slots to disk. * - * This needn't actually be part of a checkpoint, but it's a convenient - * location. + * is_shutdown is true in case of a shutdown checkpoint. */ void -CheckPointReplicationSlots(void) +CheckPointReplicationSlots(bool is_shutdown) It seems we have removed two lines from the function header comments, is this intentional or accidental? Other than this patch LGTM. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Optionally using a better backtrace library?
On Mon, Jul 03, 2023 at 11:58:25AM +0200, Alvaro Herrera wrote: > On 2023-Jul-02, Andres Freund wrote: > > I like that we now have a builtin backtrace ability. Unfortunately I think > > the > > backtraces are often not very useful, because only externally visible > > functions are symbolized. > > Agreed, these backtraces are pretty close to useless. Not completely, > but I haven't found a practical way to use them for actual debugging > of production problems. For what it's worth, I use the attached script to convert the current errbacktrace output to a fully-symbolized backtrace. Nonetheless, ... > > I hacked it up for ereport() to debug something, and the backtraces are > > considerably better: > > > > 2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] LOG: > > will crash > > 2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] > > BACKTRACE: > > [0x55fcd03e6143] PostgresMain: > > ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126 > > [0x55fcd031154c] BackendRun: > > ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461 > > [0x55fcd0310dd8] BackendStartup: > > ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189 > > [0x55fcd030ce75] ServerLoop: > > ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779 > > Yeah, this looks much more usable. ... +1 for offering this. #! /usr/bin/perl # Usage: errbacktrace2line EXE_FILE ) { if (/^[[:space:]]*(\/.*\.so)$last2_fields/) { my($obj, $offset, $return_addr) = ($1, $2, $3); conditional_flush_addr $obj; push @addr, ($offset || $return_addr); } elsif (/$last2_fields/) { my($offset, $return_addr) = ($1, $2); conditional_flush_addr $bin; push @addr, ($offset || $return_addr); } } flush_addr;
Re: persist logical slots to disk during shutdown checkpoint
On Tue, Sep 5, 2023 at 7:54 AM Zhijie Hou (Fujitsu) wrote: > > On Monday, September 4, 2023 6:15 PM vignesh C wrote: > > > > On Mon, 4 Sept 2023 at 15:20, Amit Kapila wrote: > > > > > > On Fri, Sep 1, 2023 at 10:50 AM vignesh C wrote: > > > > > > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila > > > > wrote: > > > > > > > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > > > > wrote: > > > > > > > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila > > wrote: > > > > > > > > > > > > > > All but one. Normally, the idea of marking dirty is to > > > > > > > indicate that we will actually write/flush the contents at a > > > > > > > later point (except when required for correctness) as even > > > > > > > indicated in the comments atop ReplicatioinSlotMarkDirty(). > > > > > > > However, I see your point that we use that protocol at all the > > > > > > > current > > places including CreateSlotOnDisk(). > > > > > > > So, we can probably do it here as well. > > > > > > > > > > > > yes > > > > > > > > > > > > > > > > I think we should also ensure that slots are not invalidated > > > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty > > > > > because we don't allow decoding from such slots, so we shouldn't > > > > > include those. > > > > > > > > Added this check. > > > > > > > > Apart from this I have also fixed the following issues that were > > > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots > > > > instead of setting it in SaveSlotToPath > > > > > > > > > > + if (is_shutdown && SlotIsLogical(s)) { SpinLockAcquire(&s->mutex); > > > + if (s->data.invalidated == RS_INVAL_NONE && > > > + s->data.confirmed_flush != s->last_saved_confirmed_flush) dirty = > > > + s->true; > > > > > > I think it is better to use ReplicationSlotMarkDirty() as that would > > > be consistent with all other usages. > > > > ReplicationSlotMarkDirty works only on MyReplicationSlot whereas > > CheckpointReplicationSlots loops through all the slots and marks the > > appropriate slot as dirty, we might have to change ReplicationSlotMarkDirty > > to > > take the slot as input parameter and all caller should pass > > MyReplicationSlot. > > Personally, I feel if we want to centralize the code of marking dirty into a > function, we can introduce a new static function MarkSlotDirty(slot) to mark > passed slot dirty and let ReplicationSlotMarkDirty and > CheckpointReplicationSlots call it. Like: > > void > ReplicationSlotMarkDirty(void) > { > MarkSlotDirty(MyReplicationSlot); > } > > +static void > +MarkSlotDirty(ReplicationSlot *slot) > +{ > + Assert(slot != NULL); > + > + SpinLockAcquire(&slot->mutex); > + slot->just_dirtied = true; > + slot->dirty = true; > + SpinLockRelease(&slot->mutex); > +} > > This is somewhat similar to the relation between ReplicationSlotSave(serialize > my backend's replications slot) and SaveSlotToPath(save the passed slot). > > > Another thing is we have already taken spin lock to access > > last_confirmed_flush_lsn from CheckpointReplicationSlots, we could set dirty > > flag here itself, else we will have to release the lock and call > > ReplicationSlotMarkDirty which will take lock again. > > Yes, this is unavoidable, but maybe it's not a big problem as > we only do it at shutdown. > True but still it doesn't look elegant. I also thought about having a probably inline function that marks both just_dirty and dirty fields. However, that requires us to assert that the caller has already acquired a spinlock. I see a macro SpinLockFree() that might help but it didn't seem to be used anywhere in the code so not sure if we can rely on it. > > Instead shall we set just_dirtied also in CheckpointReplicationSlots? > > Thoughts? > > I agree we'd better set just_dirtied to true to ensure we will serialize slot > info here, because if some other processes just serialized the slot, the dirty > flag will be reset to false if we don't set just_dirtied to true in > CheckpointReplicationSlots(), this race condition may not exists for now, but > seems better to completely forbid it by setting just_dirtied. > Agreed, and it is better to close any such possibility because we can't say with certainty about manual slots. This seems better than the other ideas we discussed. -- With Regards, Amit Kapila.
Re: Report planning memory in EXPLAIN ANALYZE
Using your patch I found out one redundant memory usage in the planner [1]. It can be interesting as an example of how this patch can detect problems. [1] Optimize planner memory consumption for huge arrays https://www.postgresql.org/message-id/flat/em9939439a-441a-4b27-a977-ebdf5987dc49%407d14f008.com -- Regards, Andrei Lepikhov On Thu, Aug 24, 2023, at 5:31 PM, Ashutosh Bapat wrote: > Sorry for the late reply. I was working on David's suggestion. > > Here's a response to your questions and also a new set of patches. > > On Tue, Aug 22, 2023 at 1:16 PM jian he wrote: >> Hi. I tested it. >> not sure if following is desired behavior. first run with explain, >> then run with explain(summary on). >> the second time, Planning Memory: 0 bytes. >> >> regression=# PREPARE q4 AS SELECT 1 AS a; >> explain EXECUTE q4; >> QUERY PLAN >> -- >> Result (cost=0.00..0.01 rows=1 width=4) >> (1 row) >> >> regression=# explain(summary on) EXECUTE q4; >> QUERY PLAN >> -- >> Result (cost=0.00..0.01 rows=1 width=4) >> Planning Time: 0.009 ms >> Planning Memory: 0 bytes >> (3 rows) >> - > > Yes. This is expected since the plan is already available and no > memory is required to fetch it from the cache. I imagine, if there > were parameters to the prepared plan, it would consume some memory to > evaluate those parameters and some more memory if replanning was > required. > > >> previously, if you want stats of a given memory context and its >> children, you can only use MemoryContextStatsDetail. >> but it will only go to stderr or LOG_SERVER_ONLY. >> Now, MemoryContextMemUsed is being exposed. I can do something like: >> >> mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); >> //do stuff. >> mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) - mem_consumed; >> >> it will give me the NET memory consumed by doing staff in between. Is >> my understanding correct? > > Yes. > > Here are three patches based on the latest master. > > 0001 > > this is same as the previous patch with few things fixed. 1. Call > MemoryContextMemUsed() before INSTR_TIME_SET_CURRENT so that the time > taken by MemoryContextMemUsed() is not counted in planning time. 2. In > ExplainOnePlan, use a separate code block for reporting memory. > > 0002 > > This patch reports both memory allocated and memory used in the > CurrentMemoryContext at the time of planning. It converts "Planning > Memory" into a section with two values reported as "used" and > "allocated" as below > > #explain (summary on) select * from pg_class c, pg_type t where > c.reltype = t.oid; > QUERY PLAN > -- > Hash Join (cost=28.84..47.08 rows=414 width=533) >... snip ... > Planning Time: 9.274 ms > Planning Memory: used=80848 bytes allocated=90112 bytes > (7 rows) > > In JSON format > #explain (summary on, format json) select * from pg_class c, pg_type t > where c.reltype = t.oid; > QUERY PLAN > --- > [+ >{ + > "Plan": {+ > ... snip ... > }, + > "Planning Time": 0.466, + > "Planning Memory": { + >"Used": 80608, + >"Allocated": 90112 + > }+ >} + > ] > (1 row) > > PFA explain and explain analyze output in all the formats. > > The patch adds MemoryContextMemConsumed() which is similar to > MemoryContextStats() or MemoryContextStatsDetails() except 1. the > later always prints the memory statistics to either stderr or to the > server error log and 2. it doesn't return MemoryContextCounters that > it gathered. We should probably change MemoryContextStats or > MemoryContextStatsDetails() according to those two points and not add > MemoryContextMemConsumed(). > > I have not merged this into 0001 yet. But once we agree upon whether > this is the right thing to do, I will merge it into 0001. > > 0003 > > When reporting memory allocated, a confusion may arise as to whether > to report the "net" memory allocated between start and end of planner > OR only the memory that remains allocated after end. This confusion > can be avoided by using an exclusive memory context (child of > CurrentMemoryContext) for planning. That's what 0003 implements as > suggested by David. As mentioned in one of the comments in the patch, > in order to measure memory allocated accurately the new MemoryContext > has to be of the same type as the memory context that will be used > otherwise by the pla
Re: Sync scan & regression tests
Heikki Linnakangas writes: > With shared_buffers='20MB', the tests passed. I'm going to change it > back to 10MB now, so that we continue to cover that case. So chipmunk is getting through the core tests now, but instead it is failing in contrib/pg_visibility [1]: diff -U3 /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out --- /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out 2022-10-08 19:00:15.905074105 +0300 +++ /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out 2023-09-02 00:25:51.814148116 +0300 @@ -218,7 +218,8 @@ 0 | t | t 1 | t | t 2 | t | t -(3 rows) + 3 | f | f +(4 rows) select * from pg_check_frozen('copyfreeze'); t_ctid I find this easily reproducible by setting shared_buffers=10MB. But I'm confused about why, because the affected test case dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk passed many times after that. Might be worth bisecting in the interval where chipmunk wasn't reporting? regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-09-01%2015%3A15%3A56
RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases
On Monday, September 4, 2023 10:42 PM Masahiko Sawada wrote: Hi, > On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada > wrote: > > > > > Thanks for reviewing the patch. Pushed. > > > > My colleague Vignesh noticed that the newly added test cases were failing in > BF animal sungazer[1]. > > Thank you for reporting! > > > > > The test failed to drop the slot which is active on publisher. > > > > --error-log-- > > This failure is because pg_drop_replication_slot fails with "replication > > slot > "test_tab2_sub" is active for PID 55771638": > > 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement: > > SELECT pg_drop_replication_slot('test_tab2_sub') > > 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR: > > replication slot "test_tab2_sub" is active for PID 55771638 > > 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT: > > SELECT pg_drop_replication_slot('test_tab2_sub') > > - > > > > I the reason is that the test DISABLEd the subscription before > > dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for > > the walsender to release the slot, so it's possible that the walsender > > is still alive when calling > > pg_drop_replication_slot() to drop the slot on > > publisher(pg_drop_xxxslot() doesn't wait for slot to be released). > > I agree with your analysis. > > > > > I think we can wait for the slot to become inactive before dropping like: > > $node_primary->poll_query_until('otherdb', > > "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots > WHERE active_pid IS NOT NULL)" > > ) > > > > I prefer this approach but it would be better to specify the slot name in the > query. > > > Or we can just don't drop the slot as it’s the last testcase. > > Since we might add other tests after that in the future, I think it's better > to drop > the replication slot (and subscription). > > > > > Another thing might be worth considering is we can add one parameter > > in > > pg_drop_replication_slot() to let user control whether to wait or not, > > and the test can be fixed as well by passing nowait=false to the func. > > While it could be useful in general we cannot use this approach for this issue > since it cannot be backpatched to older branches. We might want to discuss it > on a new thread. > > I've attached a draft patch to fix this issue. Feedback is very welcome. Thanks, it looks good to me. Best Regards, Hou zj
Re: pg_upgrade fails with in-place tablespace[
Thank you for your response. It is evident that there is a need for this features in our system. Firstly, our customers express their desire to utilize tablespaces for table management, without necessarily being concerned about the directory location of these tablespaces. Secondly, currently PG only supports absolute-path tablespaces, but in-place tablespaces are very likely to become popular in the future. Therefore, it is essential to incorporate support for in-place tablespaces in the pg_upgrade feature. I intend to implement this functionality in our system to accommodate our customers' requirements. It would be highly appreciated if the official PG could also incorporate support for this feature. -- Best regards, Rui Zhao
RE: persist logical slots to disk during shutdown checkpoint
On Monday, September 4, 2023 6:15 PM vignesh C wrote: > > On Mon, 4 Sept 2023 at 15:20, Amit Kapila wrote: > > > > On Fri, Sep 1, 2023 at 10:50 AM vignesh C wrote: > > > > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila wrote: > > > > > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > > > wrote: > > > > > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila > wrote: > > > > > > > > > > > > All but one. Normally, the idea of marking dirty is to > > > > > > indicate that we will actually write/flush the contents at a > > > > > > later point (except when required for correctness) as even > > > > > > indicated in the comments atop ReplicatioinSlotMarkDirty(). > > > > > > However, I see your point that we use that protocol at all the > > > > > > current > places including CreateSlotOnDisk(). > > > > > > So, we can probably do it here as well. > > > > > > > > > > yes > > > > > > > > > > > > > I think we should also ensure that slots are not invalidated > > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty > > > > because we don't allow decoding from such slots, so we shouldn't > > > > include those. > > > > > > Added this check. > > > > > > Apart from this I have also fixed the following issues that were > > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots > > > instead of setting it in SaveSlotToPath > > > > > > > + if (is_shutdown && SlotIsLogical(s)) { SpinLockAcquire(&s->mutex); > > + if (s->data.invalidated == RS_INVAL_NONE && > > + s->data.confirmed_flush != s->last_saved_confirmed_flush) dirty = > > + s->true; > > > > I think it is better to use ReplicationSlotMarkDirty() as that would > > be consistent with all other usages. > > ReplicationSlotMarkDirty works only on MyReplicationSlot whereas > CheckpointReplicationSlots loops through all the slots and marks the > appropriate slot as dirty, we might have to change ReplicationSlotMarkDirty to > take the slot as input parameter and all caller should pass MyReplicationSlot. Personally, I feel if we want to centralize the code of marking dirty into a function, we can introduce a new static function MarkSlotDirty(slot) to mark passed slot dirty and let ReplicationSlotMarkDirty and CheckpointReplicationSlots call it. Like: void ReplicationSlotMarkDirty(void) { MarkSlotDirty(MyReplicationSlot); } +static void +MarkSlotDirty(ReplicationSlot *slot) +{ + Assert(slot != NULL); + + SpinLockAcquire(&slot->mutex); + slot->just_dirtied = true; + slot->dirty = true; + SpinLockRelease(&slot->mutex); +} This is somewhat similar to the relation between ReplicationSlotSave(serialize my backend's replications slot) and SaveSlotToPath(save the passed slot). > Another thing is we have already taken spin lock to access > last_confirmed_flush_lsn from CheckpointReplicationSlots, we could set dirty > flag here itself, else we will have to release the lock and call > ReplicationSlotMarkDirty which will take lock again. Yes, this is unavoidable, but maybe it's not a big problem as we only do it at shutdown. > Instead shall we set just_dirtied also in CheckpointReplicationSlots? > Thoughts? I agree we'd better set just_dirtied to true to ensure we will serialize slot info here, because if some other processes just serialized the slot, the dirty flag will be reset to false if we don't set just_dirtied to true in CheckpointReplicationSlots(), this race condition may not exists for now, but seems better to completely forbid it by setting just_dirtied. Best Regards, Hou zj
Re: pg_upgrade fails with in-place tablespace
Thank you for your response. It is evident that there is a need for this features in our system. Firstly, our customers express their desire to utilize tablespaces for table management, without necessarily being concerned about the directory location of these tablespaces. Secondly, currently PG only supports absolute-path tablespaces, but in-place tablespaces are very likely to become popular in the future. Therefore, it is essential to incorporate support for in-place tablespaces in the pg_upgrade feature. I intend to implement this functionality in our system to accommodate our customers' requirements. It would be highly appreciated if the official PG could also incorporate support for this feature. -- Best regards, Rui Zhao -- From:Michael Paquier Sent At:2023 Sep. 1 (Fri.) 12:58 To:Mark Cc:pgsql-hackers Subject:Re: pg_upgrade fails with in-place tablespace[ On Sat, Aug 19, 2023 at 08:11:28PM +0800, Rui Zhao wrote: > Please refer to the TAP test I have included for a better understanding > of my suggestion. Sure, but it seems to me that my original question is not really answered: what's your use case for being able to support in-place tablespaces in pg_upgrade? The original use case being such tablespaces is to ease the introduction of tests with primaries and standbys, which is not something that really applies to pg_upgrade, no? Perhaps there is meaning in having more TAP tests with tablespaces and a combination of primary/standbys, still having in-place tablespaces does not really make much sense to me because, as these are in the data folder, we don't use them to test the path re-creation logic. I think that we should just add a routine in check.c that scans pg_tablespace, reporting all the non-absolute paths found with their associated tablespace names. -- Michael
Re: Cleaning up array_in()
On Mon, Sep 4, 2023 at 8:00 AM jian he wrote: > > hi. > attached v4. > v4, 0001 to 0005 is the same as v3 in > https://www.postgresql.org/message-id/5859ce4e-2be4-92b0-c85c-e1e24eab57c6%40iki.fi > > v4-0006 doing some modifications to address the corner case mentioned > in the previous thread (like select '{{1,},{1},}'::text[]). > also fixed all these FIXME, Heikki mentioned in the code. > > v4-0007 refactor ReadDimensionInt. to make the array dimension bound > variables within the INT_MIN and INT_MAX. so it will make select > '[21474836488:21474836489]={1,2}'::int[]; fail. attached, same as v4, but delete unused variable {nitems_according_to_dims}. From ca35ce0cba131e6cb65147d4f4dd6ea4277bcf15 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Sat, 8 Jul 2023 22:19:48 +0300 Subject: [PATCH v4 5/7] Determine array dimensions and parse the elements in one pass. The logic in ArrayCount() and ReadArrayStr() was hard to follow, because they mixed low-level escaping and quoting with tracking the curly braces. Introduce a ReadArrayToken() function that tokenizes the input, handling quoting and escaping. Instead of having separate ArrayCount() and ReadArrayStr() functions, move the logic for determining and checking the structure of the dimensions to ReadArrayStr(). ReadArrayStr() now determines the dimensions and parses the elements in one pass. ReadArrayStr() no longer scribbles on the input. Instead, it allocates a separate scratch buffer that it uses to hold the string representation of each element. The scratch buffer must be as large as the input string, so this doesn't save memory, but palloc(strlen(x)) is cheaper than pstrdup(x). --- src/backend/utils/adt/arrayfuncs.c | 1026 +++--- src/test/regress/expected/arrays.out |2 +- src/tools/pgindent/typedefs.list |2 +- 3 files changed, 431 insertions(+), 599 deletions(-) diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 7a63db09..19d7af6d 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -53,18 +53,6 @@ bool Array_nulls = true; PG_FREE_IF_COPY(array, n); \ } while (0) -typedef enum -{ - ARRAY_NO_LEVEL, - ARRAY_LEVEL_STARTED, - ARRAY_ELEM_STARTED, - ARRAY_QUOTED_ELEM_STARTED, - ARRAY_QUOTED_ELEM_COMPLETED, - ARRAY_ELEM_DELIMITED, - ARRAY_LEVEL_COMPLETED, - ARRAY_LEVEL_DELIMITED -} ArrayParseState; - /* Working state for array_iterate() */ typedef struct ArrayIteratorData { @@ -89,18 +77,30 @@ typedef struct ArrayIteratorData int current_item; /* the item # we're at in the array */ } ArrayIteratorData; +/* ReadArrayToken return type */ +typedef enum +{ + ATOK_LEVEL_START, + ATOK_LEVEL_END, + ATOK_DELIM, + ATOK_ELEM, + ATOK_ELEM_NULL, + ATOK_ERROR, +} ArrayToken; + static bool ReadDimensionInt(char **srcptr, int *result, const char *origStr, Node *escontext); static bool ReadArrayDimensions(char **srcptr, int *ndim, int *dim, int *lBound, const char *origStr, Node *escontext); -static int ArrayCount(const char *str, int *dim, char typdelim, - Node *escontext); -static bool ReadArrayStr(char *arrayStr, const char *origStr, - int nitems, +static bool ReadArrayStr(char **srcptr, FmgrInfo *inputproc, Oid typioparam, int32 typmod, char typdelim, int typlen, bool typbyval, char typalign, - Datum *values, bool *nulls, - bool *hasnulls, int32 *nbytes, Node *escontext); + int *ndim, int *dim, + int *nitems, + Datum **values, bool **nulls, + const char *origStr, Node *escontext); +static ArrayToken ReadArrayToken(char **srcptr, char *elembuf, char typdelim, + const char *origStr, Node *escontext); static void ReadArrayBinary(StringInfo buf, int nitems, FmgrInfo *receiveproc, Oid typioparam, int32 typmod, int typlen, bool typbyval, char typalign, @@ -186,12 +186,11 @@ array_in(PG_FUNCTION_ARGS) char typalign; char typdelim; Oid typioparam; - char *string_save, - *p; - int i, -nitems; - Datum *dataPtr; - bool *nullsPtr; + char *p; + int nitems; + int nitems_according_to_dims; + Datum *values; + bool *nulls; bool hasnulls; int32 nbytes; int32 dataoffset; @@ -234,15 +233,13 @@ array_in(PG_FUNCTION_ARGS) typdelim = my_extra->typdelim; typioparam = my_extra->typioparam; - /* Make a modifiable copy of the input */ - string_save = pstrdup(string); - /* + * Start processing the input string. + * * If the input string starts with dimension info, read and use that. - * Otherwise, we require the input to be in curly-brace style, and we - * prescan the input to determine dimensions. + * Otherwise, we determine the dimensions from the in curly-braces. */ - p = string_save; + p = string; if (!ReadArrayDimensions(&p, &ndim, dim, lBound, string, escontext)) return (Datum) 0; @@ -254,17 +251,11 @@ array_in(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_T
Re: brininsert optimization opportunity
Rebased against latest HEAD. Regards, Soumyadeep (VMware) From 94a8acd3125aa4a613c238e435ed78dba9f40625 Mon Sep 17 00:00:00 2001 From: Soumyadeep Chakraborty Date: Sat, 29 Jul 2023 09:17:49 -0700 Subject: [PATCH v5 1/1] Reuse revmap and brin desc in brininsert brininsert() used to have code that performed per-tuple initialization of the revmap. That had some overhead. To mitigate, we introduce a new AM callback to clean up state at the end of all inserts, and perform the revmap cleanup there. --- contrib/bloom/blutils.c | 1 + doc/src/sgml/indexam.sgml| 14 +- src/backend/access/brin/brin.c | 74 +++- src/backend/access/gin/ginutil.c | 1 + src/backend/access/gist/gist.c | 1 + src/backend/access/hash/hash.c | 1 + src/backend/access/index/indexam.c | 15 ++ src/backend/access/nbtree/nbtree.c | 1 + src/backend/access/spgist/spgutils.c | 1 + src/backend/executor/execIndexing.c | 5 ++ src/include/access/amapi.h | 3 ++ src/include/access/brin_internal.h | 1 + src/include/access/genam.h | 2 + 13 files changed, 108 insertions(+), 12 deletions(-) diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c index f23fbb1d9e0..4830cb3fee6 100644 --- a/contrib/bloom/blutils.c +++ b/contrib/bloom/blutils.c @@ -131,6 +131,7 @@ blhandler(PG_FUNCTION_ARGS) amroutine->ambuild = blbuild; amroutine->ambuildempty = blbuildempty; amroutine->aminsert = blinsert; + amroutine->aminsertcleanup = NULL; amroutine->ambulkdelete = blbulkdelete; amroutine->amvacuumcleanup = blvacuumcleanup; amroutine->amcanreturn = NULL; diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index e813e2b620a..17f7d6597f1 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -139,6 +139,7 @@ typedef struct IndexAmRoutine ambuild_function ambuild; ambuildempty_function ambuildempty; aminsert_function aminsert; +aminsertcleanup_function aminsertcleanup; ambulkdelete_function ambulkdelete; amvacuumcleanup_function amvacuumcleanup; amcanreturn_function amcanreturn; /* can be NULL */ @@ -355,11 +356,22 @@ aminsert (Relation indexRelation, within an SQL statement, it can allocate space in indexInfo->ii_Context and store a pointer to the data in indexInfo->ii_AmCache (which will be NULL - initially). + initially). If the data cached contains structures that can be simply pfree'd, + they will implicitly be pfree'd. But, if it contains more complex data, such + as Buffers or TupleDescs, additional cleanup is necessary. This additional + cleanup can be performed in indexinsertcleanup. +bool +aminsertcleanup (IndexInfo *indexInfo); + + Clean up state that was maintained across successive inserts in + indexInfo->ii_AmCache. This is useful if the data + contained is complex - like Buffers or TupleDescs which need additional + cleanup, unlike simple structures that will be implicitly pfree'd. + IndexBulkDeleteResult * ambulkdelete (IndexVacuumInfo *info, IndexBulkDeleteResult *stats, diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index d4fec654bb6..76927beb0ec 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -58,6 +58,17 @@ typedef struct BrinBuildState BrinMemTuple *bs_dtuple; } BrinBuildState; +/* + * We use a BrinInsertState to capture running state spanning multiple + * brininsert invocations, within the same command. + */ +typedef struct BrinInsertState +{ + BrinRevmap *bs_rmAccess; + BrinDesc *bs_desc; + BlockNumber bs_pages_per_range; +} BrinInsertState; + /* * Struct used as "opaque" during index scans */ @@ -72,6 +83,7 @@ typedef struct BrinOpaque static BrinBuildState *initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, BlockNumber pagesPerRange); +static BrinInsertState *initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo); static void terminate_brin_buildstate(BrinBuildState *state); static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, bool include_partial, double *numSummarized, double *numExisting); @@ -117,6 +129,7 @@ brinhandler(PG_FUNCTION_ARGS) amroutine->ambuild = brinbuild; amroutine->ambuildempty = brinbuildempty; amroutine->aminsert = brininsert; + amroutine->aminsertcleanup = brininsertcleanup; amroutine->ambulkdelete = brinbulkdelete; amroutine->amvacuumcleanup = brinvacuumcleanup; amroutine->amcanreturn = NULL; @@ -140,6 +153,28 @@ brinhandler(PG_FUNCTION_ARGS) PG_RETURN_POINTER(amroutine); } +/* + * Initialize a BrinInsertState to maintain state to be used across multiple + * tuple inserts, within the same command. + */ +static BrinInsertState * +initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo) +{ + BrinInsertState *bistate; + MemoryContext oldcxt; + + oldcxt = MemoryContextSwit
Re: Inefficiency in parallel pg_restore with many tables
On Sat, Sep 02, 2023 at 11:55:21AM -0700, Nathan Bossart wrote: > I ended up hacking together a (nowhere near committable) patch to see how > hard it would be to allow using any type with binaryheap. It doesn't seem > too bad. I spent some more time on this patch and made the relevant adjustments to the rest of the set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 3a51f25ae712b792bdde90b133c09b0b940f4198 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 4 Sep 2023 15:04:40 -0700 Subject: [PATCH v7 1/5] Allow binaryheap to use any type. We would like to make binaryheap available to frontend code in a future commit, but presently the implementation uses Datum for the node type, which is inaccessible in the frontend. This commit modifies the implementation to allow using any type for the nodes. binaryheap_allocate() now requires callers to specify the size of the node, and several of the other functions now use void pointers. --- src/backend/executor/nodeGatherMerge.c| 19 ++-- src/backend/executor/nodeMergeAppend.c| 22 ++--- src/backend/lib/binaryheap.c | 99 +++ src/backend/postmaster/pgarch.c | 36 --- .../replication/logical/reorderbuffer.c | 21 ++-- src/backend/storage/buffer/bufmgr.c | 20 ++-- src/include/lib/binaryheap.h | 21 ++-- 7 files changed, 137 insertions(+), 101 deletions(-) diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c index 9d5e1a46e9..f76406b575 100644 --- a/src/backend/executor/nodeGatherMerge.c +++ b/src/backend/executor/nodeGatherMerge.c @@ -52,7 +52,7 @@ typedef struct GMReaderTupleBuffer } GMReaderTupleBuffer; static TupleTableSlot *ExecGatherMerge(PlanState *pstate); -static int32 heap_compare_slots(Datum a, Datum b, void *arg); +static int32 heap_compare_slots(const void *a, const void *b, void *arg); static TupleTableSlot *gather_merge_getnext(GatherMergeState *gm_state); static MinimalTuple gm_readnext_tuple(GatherMergeState *gm_state, int nreader, bool nowait, bool *done); @@ -429,6 +429,7 @@ gather_merge_setup(GatherMergeState *gm_state) /* Allocate the resources for the merge */ gm_state->gm_heap = binaryheap_allocate(nreaders + 1, + sizeof(int), heap_compare_slots, gm_state); } @@ -489,7 +490,7 @@ reread: /* Don't have a tuple yet, try to get one */ if (gather_merge_readnext(gm_state, i, nowait)) binaryheap_add_unordered(gm_state->gm_heap, - Int32GetDatum(i)); + &i); } else { @@ -565,14 +566,14 @@ gather_merge_getnext(GatherMergeState *gm_state) * the heap, because it might now compare differently against the * other elements of the heap. */ - i = DatumGetInt32(binaryheap_first(gm_state->gm_heap)); + binaryheap_first(gm_state->gm_heap, &i); if (gather_merge_readnext(gm_state, i, false)) - binaryheap_replace_first(gm_state->gm_heap, Int32GetDatum(i)); + binaryheap_replace_first(gm_state->gm_heap, &i); else { /* reader exhausted, remove it from heap */ - (void) binaryheap_remove_first(gm_state->gm_heap); + binaryheap_remove_first(gm_state->gm_heap, NULL); } } @@ -585,7 +586,7 @@ gather_merge_getnext(GatherMergeState *gm_state) else { /* Return next tuple from whichever participant has the leading one */ - i = DatumGetInt32(binaryheap_first(gm_state->gm_heap)); + binaryheap_first(gm_state->gm_heap, &i); return gm_state->gm_slots[i]; } } @@ -750,11 +751,11 @@ typedef int32 SlotNumber; * Compare the tuples in the two given slots. */ static int32 -heap_compare_slots(Datum a, Datum b, void *arg) +heap_compare_slots(const void *a, const void *b, void *arg) { GatherMergeState *node = (GatherMergeState *) arg; - SlotNumber slot1 = DatumGetInt32(a); - SlotNumber slot2 = DatumGetInt32(b); + SlotNumber slot1 = *((const int *) a); + SlotNumber slot2 = *((const int *) b); TupleTableSlot *s1 = node->gm_slots[slot1]; TupleTableSlot *s2 = node->gm_slots[slot2]; diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c index 21b5726e6e..e0ace294a0 100644 --- a/src/backend/executor/nodeMergeAppend.c +++ b/src/backend/executor/nodeMergeAppend.c @@ -52,7 +52,7 @@ typedef int32 SlotNumber; static TupleTableSlot *ExecMergeAppend(PlanState *pstate); -static int heap_compare_slots(Datum a, Datum b, void *arg); +static int heap_compare_slots(const void *a, const void *b, void *arg); /* @@ -125,8 +125,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) mergestate->ms_nplans = nplans; mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans); - mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots, - mergestate); + mergestate->ms_heap = binaryheap_allocate
Re: backtrace_functions emits trace for any elog
On Mon, Sep 04, 2023 at 09:30:32PM +0100, Ilya Gladyshev wrote: > I used backtrace_functions to debug one of my ideas and found its behavior > counter-intuitive and contradictory to it own docs. I think the GUC is > supposed to be used to dump backtrace only on elog(ERROR) (should it also be > done for higher levels? not sure about this), but, in fact, it does that for > any log-level. I have attached a patch that checks log-level before attaching > backtrace. This would make the feature much less useful. Better to change the docs.
Re: Create shorthand for including all extra tests
On Mon, Sep 04, 2023 at 04:30:31PM -0400, Tom Lane wrote: > Noah Misch writes: > > On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote: > >> On 4 Sep 2023, at 17:01, Tom Lane wrote: > >>> I think this is a seriously bad idea. The entire point of not including > >>> certain tests in check-world by default is that the omitted tests are > >>> security hazards, so a developer or buildfarm owner should review each > >>> one before deciding whether to activate it on their machine. > > > Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same > > hazard: > > they treat the loopback interface as private, so anyone with access to > > loopback interface ports can hijack the test. I'd be fine with e.g. > > PG_TEST_EXTRA=private-lo activating all of those. We don't gain by inviting > > the tester to review the tests to rediscover this common factor. > > Yeah, I could live with something like that from the security standpoint. > Not sure if it helps Nazir's use-case though. Maybe we could invent > categories that can be used in place of individual test names? > For now, > > PG_TEST_EXTRA="needs-private-lo slow" > > would cover the territory of "all", and I think it'd be very seldom > that we'd have to invent new categories here (though maybe I lack > imagination today). I could imagine categories for filesystem bytes and RAM bytes. Also, while needs-private-lo has a bounded definition, "slow" doesn't. If today's one "slow" test increases check-world duration by 1.1x, we may not let a 100x-increase test use the same keyword. If one introduced needs-private-lo, the present spelling of "all" would be "needs-private-lo wal_consistency_checking". Looks okay to me. Doing nothing here wouldn't be ruinous, of course.
Re: information_schema and not-null constraints
Alvaro Herrera writes: > In 0002, I took the tests added by Peter's proposed patch and put them > in a separate test file that runs at the end. There are some issues, > however. One is that the ORDER BY clause in the check_constraints view > is not fully deterministic, because the table name is not part of the > view definition, so we cannot sort by table name. I object very very strongly to this proposed test method. It completely undoes the work I did in v15 (cc50080a8 and related) to make the core regression test scripts mostly independent of each other. Even without considering the use-case of running a subset of the tests, the new test's expected output will constantly be needing updates as side effects of unrelated changes. regards, tom lane
backtrace_functions emits trace for any elog
Hi, I used backtrace_functions to debug one of my ideas and found its behavior counter-intuitive and contradictory to it own docs. I think the GUC is supposed to be used to dump backtrace only on elog(ERROR) (should it also be done for higher levels? not sure about this), but, in fact, it does that for any log-level. I have attached a patch that checks log-level before attaching backtrace. Regards, Ilya 0001-check-elevel-before-attaching-backtrace.patch Description: Binary data
Re: Create shorthand for including all extra tests
Noah Misch writes: > On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote: >> On 4 Sep 2023, at 17:01, Tom Lane wrote: >>> I think this is a seriously bad idea. The entire point of not including >>> certain tests in check-world by default is that the omitted tests are >>> security hazards, so a developer or buildfarm owner should review each >>> one before deciding whether to activate it on their machine. > Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same hazard: > they treat the loopback interface as private, so anyone with access to > loopback interface ports can hijack the test. I'd be fine with e.g. > PG_TEST_EXTRA=private-lo activating all of those. We don't gain by inviting > the tester to review the tests to rediscover this common factor. Yeah, I could live with something like that from the security standpoint. Not sure if it helps Nazir's use-case though. Maybe we could invent categories that can be used in place of individual test names? For now, PG_TEST_EXTRA="needs-private-lo slow" would cover the territory of "all", and I think it'd be very seldom that we'd have to invent new categories here (though maybe I lack imagination today). regards, tom lane
Re: Show various offset arrays for heap WAL records
On Mon, Jul 10, 2023 at 3:44 AM Heikki Linnakangas wrote: > I'm late to the party, but regarding commit c03c2eae0a, which added the > guidelines for writing formatting desc functions: > > You moved the comment from rmgrdesc_utils.c into rmgrdesc_utils.h, but I > don't think that was a good idea. Our usual convention is to have the > function comment in the .c file, not at the declaration in the header > file. When I want to know what a function does, I jump to the .c file, > and might miss the comment in the header entirely. > > Let's add a src/backend/access/rmgrdesc/README file. We don't currently > have any explanation anywhere why the rmgr desc functions are in a > separate directory. The README would be a good place to explain that, > and to have the formatting guidelines. See attached. diff --git a/src/backend/access/rmgrdesc/README b/src/backend/access/rmgrdesc/README new file mode 100644 index 00..abe84b9f11 --- /dev/null +++ b/src/backend/access/rmgrdesc/README @@ -0,0 +1,60 @@ +src/backend/access/rmgrdesc/README + +WAL resource manager description functions +== + +For debugging purposes, there is a "description function", or rmgrdesc +function, for each WAL resource manager. The rmgrdesc function parses the WAL +record and prints the contents of the WAL record in a somewhat human-readable +format. + +The rmgrdesc functions for all resource managers are gathered in this +directory, because they are also used in the stand-alone pg_waldump program. "standalone" seems the more common spelling of this adjective in the codebase today. +They could potentially be used by out-of-tree debugging tools too, although +the the functions or the output format should not be considered a stable API. You have an extra "the". I might phrase the last bit as "neither the description functions nor the output format should be considered part of a stable API" +Guidelines for rmgrdesc output format += I noticed you used === for both headings and wondered if it was intentional. Other READMEs I looked at in src/backend/access tend to have a single heading underlined with and then subsequent headings are underlined with -. I could see an argument either way here, but I just thought I would bring it up in case it was not a conscious choice. Otherwise, LGTM. - Melanie
Re: Commitfest 2023-09 starts soon
On Mon, Sep 4, 2023 at 1:01 PM Peter Eisentraut wrote: > > I have done several passes to make sure that patch statuses are more > accurate. As explained in a nearby message, I have set several patches > back from "Ready to Committer" to "Needs review" if additional > discussion happened past the first status change. I have also in > several cases removed reviewers from a patch entry if they haven't been > active on the thread in several months. (Feel free to sign up again, > but don't "block" the patch.) I had originally planned to write thread summaries for all status="Needs review" patches in the commitfest. However, ISTM these summaries are largely useful for reaching consensus on changing the status of these patches (setting them to "waiting on author", "returned with feedback", etc). Since Peter has already updated all of the statuses, I've decided not to write summaries and instead spend that time doing review. However, I thought it might be useful to provide a list of all of the "Needs review" entries which, at the time of writing, have had zero reviews or replies (except from the author): Document efficient self-joins / UPDATE LIMIT techniques. https://commitfest.postgresql.org/44/4539/ pg_basebackup: Always return valid temporary slot names https://commitfest.postgresql.org/44/4534/ pg_resetwal tests, logging, and docs update https://commitfest.postgresql.org/44/4533/ Streaming I/O, vectored I/O https://commitfest.postgresql.org/44/4532/ Improving the heapgetpage function improves performance in common scenarios https://commitfest.postgresql.org/44/4524/ CI speed improvements for FreeBSD https://commitfest.postgresql.org/44/4520/ Implementation of distinct in Window Aggregates: take two https://commitfest.postgresql.org/44/4519/ Improve pg_restore toc file parsing and format for better performances https://commitfest.postgresql.org/44/4509/ Fix false report of wraparound in pg_serial https://commitfest.postgresql.org/44/4516/ Simplify create_merge_append_path a bit for clarity https://commitfest.postgresql.org/44/4496/ Fix bogus Asserts in calc_non_nestloop_required_outer https://commitfest.postgresql.org/44/4478/ Retiring is_pushed_down https://commitfest.postgresql.org/44/4458/ Flush disk write caches by default on macOS and Windows https://commitfest.postgresql.org/44/4453/ Add last_commit_lsn to pg_stat_database https://commitfest.postgresql.org/44/4355/ Optimizing "boundary cases" during backward scan B-Tree index descents https://commitfest.postgresql.org/44/4380/ XLog size reductions: Reduced XLog record header size https://commitfest.postgresql.org/44/4386/ Unified file access using virtual file descriptors https://commitfest.postgresql.org/44/4420/ Optimise index range scan by performing first check with upper page boundary https://commitfest.postgresql.org/44/4434/ Revises for the check of parameterized partial paths https://commitfest.postgresql.org/44/4425/ Opportunistically pruning page before update https://commitfest.postgresql.org/44/4384/ Checks in RegisterBackgroundWorker() https://commitfest.postgresql.org/44/4514/ Allow direct lookups of SpecialJoinInfo by ojrelid https://commitfest.postgresql.org/44/4313/ Parent/child context relation in pg_get_backend_memory_contexts() https://commitfest.postgresql.org/44/4379/ Support Right Semi Join https://commitfest.postgresql.org/44/4284/ bug: ANALYZE progress report with inheritance tables https://commitfest.postgresql.org/44/4144/ archive modules loose ends https://commitfest.postgresql.org/44/4192/ - Melanie
Re: Create shorthand for including all extra tests
On Mon, Sep 04, 2023 at 08:16:44PM +0200, Daniel Gustafsson wrote: > > On 4 Sep 2023, at 17:01, Tom Lane wrote: > > Nazir Bilal Yavuz writes: > >> I created an 'all' option for PG_TEST_EXTRA to enable all test suites > >> defined under PG_TEST_EXTRA. > > > > I think this is a seriously bad idea. The entire point of not including > > certain tests in check-world by default is that the omitted tests are > > security hazards, so a developer or buildfarm owner should review each > > one before deciding whether to activate it on their machine. > > I dunno, I've certainly managed to not run the tests I hoped to multiple > times. > I think it could be useful for sandboxed testrunners which are destroyed after > each run. There is for sure a foot-gun angle to it, no question about that. Other than PG_TEST_EXTRA=wal_consistency_checking, they have the same hazard: they treat the loopback interface as private, so anyone with access to loopback interface ports can hijack the test. I'd be fine with e.g. PG_TEST_EXTRA=private-lo activating all of those. We don't gain by inviting the tester to review the tests to rediscover this common factor.
Re: Create shorthand for including all extra tests
> On 4 Sep 2023, at 17:01, Tom Lane wrote: > > Nazir Bilal Yavuz writes: >> I created an 'all' option for PG_TEST_EXTRA to enable all test suites >> defined under PG_TEST_EXTRA. > > I think this is a seriously bad idea. The entire point of not including > certain tests in check-world by default is that the omitted tests are > security hazards, so a developer or buildfarm owner should review each > one before deciding whether to activate it on their machine. I dunno, I've certainly managed to not run the tests I hoped to multiple times. I think it could be useful for sandboxed testrunners which are destroyed after each run. There is for sure a foot-gun angle to it, no question about that. -- Daniel Gustafsson
Re: Commitfest 2023-09 starts soon
On Mon, 4 Sept 2023 at 18:19, Aleksander Alekseev wrote: > > Hi Matthias, > > > I'm a bit confused about your use of "consensus". True, there was no > > objection, but it looks like no patch author or reviewer was informed > > (cc-ed or directly referenced) that the patch was being discussed > > before achieving this "consensus", and the "consensus" was reached > > within 4 days, of which 2 weekend, in a thread that has (until now) > > involved only you and Peter E. > > > > Usually, you'd expect discussion about a patch to happen on the > > patch's thread before any action is taken (or at least a mention on > > that thread), but quite clearly that hasn't happened here. > > Are patch authors expected to follow any and all discussion on threads > > with "Commitfest" in the title? > > If so, shouldn't the relevant wiki pages be updated, and/or the > > -hackers community be updated by mail in a new thread about these > > policy changes? > > I understand your disappointment and assure you that no one is acting > with bad intentions here. Also please note that English is a second > language for many of us which represents a challenge when it comes to > expressing thoughts on the mailing list. We have a common goal here, > to make PostgreSQL an even better system than it is now. > > The patches under question were in "Waiting for Author" state for a > *long* time and the authors were notified about this. We could toss > such patches from one CF to another month after month or mark as RwF > and let the author know that no one is going to review that patch > until the author takes the actions. It's been noted that the letter > approach is more productive in the long run. This far I agree - we can't keep patches around with issues if they're not being worked on. And I do appreciate your work on pruning dead or stale patches. But: > The discussion can > continue in the same thread and the same thread can be registered for > the upcoming CF. This is one of my major concerns here: Patch resolution is being discussed on -hackers, but outside of the thread used to discuss that patch (as indicated in the CF app), and without apparent author inclusion.To me, that feels like going behind the author's back, and I don't think that this should be normalized. As mentioned in the earlier mail, my other concern is the use of "consensus" in this context. You link to a message on -hackers, with no visible agreements. As a patch author myself, if a lack of comments on my patch in an otherwise unrelated thread is "consensus", then I'll probably move all patches that have yet to be commented on to RfC, as there'd be "consensus" that they should be included as-is in PostgreSQL. But I digress. I think it would be better to just remove the "consensus" part of your mail, and just put down the real reason why each patch is being RfC-ed or rejected. That is, don't imply that there are hackers that OK-ed it when there are none, and inform patch authors directly about the reasons why the patch is being revoked; so without "see consensus in [0]". Kind regards, Matthias van de Meent
Re: proposal: psql: show current user in prompt
po 4. 9. 2023 v 14:24 odesílatel Jelte Fennema napsal: > On Sun, 3 Sept 2023 at 20:58, Pavel Stehule > wrote: > > here is an try > > Overall it does what I had in mind. Below a few suggestions: > > +int > +PQprotocolSubversion(const PGconn *conn) > > Ugh, it's quite annoying that the original PQprotocolVersion only > returns the major version and thus we need this new function. It > seems like it would be much nicer if it returned a number similar to > PQserverVersion. I think it might be nicer to change PQprotocolVersion > to do that than to add another function. We could do: > > return PG_PROTOCOL_MAJOR(conn->pversion) * 100 + > PG_PROTOCOL_MINOR(conn->pversion); > > or even: > > if (PG_PROTOCOL_MAJOR(conn->pversion) == 3 && > PG_PROTOCOL_MINOR(conn->pversion)) > return 3; > else > return PG_PROTOCOL_MAJOR(conn->pversion) * 100 + > PG_PROTOCOL_MINOR(conn->pversion); > > The second option would be safest backwards compatibility wise, but in > practice you would only get another value than 3 (or 0) when > connecting to pre 7.4 servers. That seems old enough that I don't > think anyone is actually calling this function. **I'd like some > feedback from others on this though.** > Both versions look a little bit strange to me. I have not strong opinion about it, but I am not sure if is best to change contract after 20 years ago commit from Jun 2003 efc3a25bb02ada63158fe7006673518b005261ba I prefer to introduce a new function - it is ten lines of code. The form is not important - it can be a full number or minor number. It doesn't matter I think. But my opinion in this area is not strong, and I like to see feedback from others too. It is true that this feature and interface is not fully complete. Reards Pavel > + /* The protocol 3.0 is required */ > + if (PG_PROTOCOL_MAJOR(their_version) == 3) > + conn->pversion = their_version; > > Let's compare against the actual PG_PROTOCOL_EARLIEST and > PG_PROTOCOL_LATEST to determine if the version is supported or not. >
Re: PATCH: Add REINDEX tag to event triggers
On 01.09.23 18:10, jian he wrote: Thanks for pointing this out! Thanks for the fix! also due to commit https://git.postgresql.org/cgit/postgresql.git/commit/?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7 ExecReindex function input arguments also changed. so we need to rebase this patch. change to currentReindexStatement = unconstify(ReindexStmt*, stmt); seems to work for me. I tested it, no warning. But I am not 100% sure. anyway I refactored the patch, making it git applyable also change from "currentReindexStatement = stmt;" to "currentReindexStatement = unconstify(ReindexStmt*, stmt);" due to ExecReindex function changes. LGTM. It applies and builds cleanly, all tests pass and documentation also builds ok. The CFbot seems also much happier now :) I tried this "small stress test" to see if the code was leaking .. but it all looks ok to me: DO $$ BEGIN FOR i IN 1..1 LOOP REINDEX INDEX reindex_test.reindex_test1_idx1; REINDEX TABLE reindex_test.reindex_tester1; END LOOP; END; $$; Jim
information_schema and not-null constraints
In reference to [1], 0001 attached to this email contains the updated view definitions that I propose. In 0002, I took the tests added by Peter's proposed patch and put them in a separate test file that runs at the end. There are some issues, however. One is that the ORDER BY clause in the check_constraints view is not fully deterministic, because the table name is not part of the view definition, so we cannot sort by table name. In the current regression database there is only one case[2] where two constraints have the same name and different definition: inh_check_constraint │ 2 │ ((f1 > 0)) NOT VALID ↵ │ │ ((f1 > 0)) (on tables invalid_check_con and invalid_check_con_child). I assume this is going to bite us at some point. We could just add a WHERE clause to omit that one constraint. Another issue I notice eyeballing at the results is that foreign keys on partitioned tables are listing the rows used to implement the constraints on partitions, which are sort-of "internal" constraints (and are not displayed by psql's \d). I hope this is a relatively simple fix that we could extract from the code used by psql. Anyway, I think I'm going to get 0001 committed sometime tomorrow, and then play a bit more with 0002 to try and get it pushed soon also. Thanks [1] https://postgr.es/m/81b461c4-edab-5d8c-2f88-203108425...@enterprisedb.com [2] select constraint_name, count(*), string_agg(distinct check_clause, E'\n') from information_schema.check_constraints group by constraint_name having count(*) > 1; -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "You don't solve a bad join with SELECT DISTINCT" #CupsOfFail https://twitter.com/connor_mc_d/status/1431240081726115845 >From d8a5f8103934fe65a83a2ca44f6af72449cb6aa9 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 4 Sep 2023 18:05:50 +0200 Subject: [PATCH v2 1/2] update information_schema definition --- src/backend/catalog/information_schema.sql | 74 -- 1 file changed, 27 insertions(+), 47 deletions(-) diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index a06ec7a0a8..c402cca7f4 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -444,22 +444,19 @@ CREATE VIEW check_constraints AS WHERE pg_has_role(coalesce(c.relowner, t.typowner), 'USAGE') AND con.contype = 'c' -UNION +UNION ALL -- not-null constraints - -SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog, - CAST(n.nspname AS sql_identifier) AS constraint_schema, - CAST(CAST(n.oid AS text) || '_' || CAST(r.oid AS text) || '_' || CAST(a.attnum AS text) || '_not_null' AS sql_identifier) AS constraint_name, -- XXX - CAST(a.attname || ' IS NOT NULL' AS character_data) - AS check_clause -FROM pg_namespace n, pg_class r, pg_attribute a -WHERE n.oid = r.relnamespace - AND r.oid = a.attrelid - AND a.attnum > 0 - AND NOT a.attisdropped - AND a.attnotnull - AND r.relkind IN ('r', 'p') - AND pg_has_role(r.relowner, 'USAGE'); +SELECT current_database()::information_schema.sql_identifier AS constraint_catalog, + rs.nspname::information_schema.sql_identifier AS constraint_schema, + con.conname::information_schema.sql_identifier AS constraint_name, + format('CHECK (%s IS NOT NULL)', at.attname)::information_schema.character_data AS check_clause + FROM pg_constraint con +LEFT JOIN pg_namespace rs ON rs.oid = con.connamespace +LEFT JOIN pg_class c ON c.oid = con.conrelid +LEFT JOIN pg_type t ON t.oid = con.contypid +LEFT JOIN pg_attribute at ON (con.conrelid = at.attrelid AND con.conkey[1] = at.attnum) + WHERE pg_has_role(coalesce(c.relowner, t.typowner), 'USAGE'::text) + AND con.contype = 'n'; GRANT SELECT ON check_constraints TO PUBLIC; @@ -826,6 +823,20 @@ CREATE VIEW constraint_column_usage AS AND r.relkind IN ('r', 'p') AND NOT a.attisdropped + UNION ALL + + /* not-null constraints */ +SELECT DISTINCT nr.nspname, r.relname, r.relowner, a.attname, nc.nspname, c.conname + FROM pg_namespace nr, pg_class r, pg_attribute a, pg_namespace nc, pg_constraint c + WHERE nr.oid = r.relnamespace +AND r.oid = a.attrelid +AND r.oid = c.conrelid +AND a.attnum = c.conkey[1] +AND c.connamespace = nc.oid +AND c.contype = 'n' +AND r.relkind in ('r', 'p') +AND not a.attisdropped + UNION ALL /* unique/primary key/foreign key constraints */ @@ -1828,6 +1839,7 @@ CREATE VIEW table_constraints AS CAST(r.relname AS sql_identifier) AS table_name, CAST( CASE c.contype WHEN 'c' THEN 'CHECK' +
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi, > > Reviewing this now. I think it's almost ready to be committed. > > > > There's another big effort going on to move SLRUs to the regular buffer > > cache (https://commitfest.postgresql.org/43/3514/). I wonder how moving > > to 64 bit page numbers will affect that. BlockNumber is still 32 bits, > > after all. > > Somehow I didn't pay too much attention to this effort, thanks. I will > familiarize myself with the patch. Intuitively I don't think that the > patchse should block each other. > > > This patch makes the filenames always 12 characters wide (for SLRUs that > > opt-in to the new naming). That's actually not enough for the full range > > that a 64 bit page number can represent. Should we make it 16 characters > > now, to avoid repeating the same mistake we made earlier? Or make it > > more configurable, on a per-SLRU basis. One reason I don't want to just > > make it 16 characters is that WAL segment filenames are also 16 hex > > characters, which could cause confusion. > > Good catch. I propose the following solution: > > ``` > SlruFileName(SlruCtl ctl, char *path, int64 segno) > { > if (ctl->long_segment_names) > -return snprintf(path, MAXPGPATH, "%s/%012llX", ctl->Dir, > +/* > + * We could use 16 characters here but the disadvantage would be that > + * the SLRU segments will be hard to distinguish from WAL segments. > + * > + * For this reason we use 15 characters. It is enough but also means > + * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT > easily. > + */ > +return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir, > (long long) segno); > else > return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, > ``` > > PFE the corrected patchset v58. While triaging the patches for the September CF [1] a consensus was reached that the patchset needs another round of review. Also I removed myself from the list of reviewers in order to make it clear that a review from somebody else would be appreciated. [1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org -- Best regards, Aleksander Alekseev
Re: Commitfest 2023-09 starts soon
Hi Matthias, > I'm a bit confused about your use of "consensus". True, there was no > objection, but it looks like no patch author or reviewer was informed > (cc-ed or directly referenced) that the patch was being discussed > before achieving this "consensus", and the "consensus" was reached > within 4 days, of which 2 weekend, in a thread that has (until now) > involved only you and Peter E. > > Usually, you'd expect discussion about a patch to happen on the > patch's thread before any action is taken (or at least a mention on > that thread), but quite clearly that hasn't happened here. > Are patch authors expected to follow any and all discussion on threads > with "Commitfest" in the title? > If so, shouldn't the relevant wiki pages be updated, and/or the > -hackers community be updated by mail in a new thread about these > policy changes? I understand your disappointment and assure you that no one is acting with bad intentions here. Also please note that English is a second language for many of us which represents a challenge when it comes to expressing thoughts on the mailing list. We have a common goal here, to make PostgreSQL an even better system than it is now. The patches under question were in "Waiting for Author" state for a *long* time and the authors were notified about this. We could toss such patches from one CF to another month after month or mark as RwF and let the author know that no one is going to review that patch until the author takes the actions. It's been noted that the letter approach is more productive in the long run. The discussion can continue in the same thread and the same thread can be registered for the upcoming CF. This being said, Peter is the CF manager, so he has every right to change the status of the patches under questions if he disagrees. -- Best regards, Aleksander Alekseev
Re: SLRUs in the main buffer pool, redux
Hi, > > Unfortunately the patchset rotted quite a bit since February and needs a > > rebase. > > A consensus was reached to mark this patch as RwF for now. There > are many patches to be reviewed and this one doesn't seem to be in the > best shape, so we have to prioritise. Please feel free re-submitting > the patch for the next commitfest. See also [1] """ [...] Also, please consider joining the efforts and having one thread with a single patchset rather than different threads with different competing patches. This will simplify the work of the reviewers a lot. Personally I would suggest taking one step back and agree on a particular RFC first and then continue working on a single patchset according to this RFC. We did it in the past in similar cases and this approach proved to be productive. [...] """ [1]: https://postgr.es/m/CAJ7c6TME5Z8k4undYUmKavD_dQFL0ujA%2BzFCK1eTH0_pzM%3DXrA%40mail.gmail.com -- Best regards, Aleksander Alekseev
Re: SLRUs in the main buffer pool - Page Header definitions
Hi, > Agreed that we'd certainly want to make sure it's all correct and to do > performance testing but in terms of how many buffers... isn't much of > the point of this that we have data in memory because it's being used > and if it's not then we evict it? That is, I wouldn't think we'd have > set parts of the buffer pool for SLRUs vs. regular data but would let > the actual demand drive what pages are in the cache and I thought that > was part of this proposal and part of the reasoning behind making this > change. > > There's certainly an argument to be made that our current cache > management strategy doesn't account very well for the value of pages > (eg: btree root pages vs. random heap pages, perhaps) and that'd > certainly be a good thing to improve on, but that's independent of this. > If it's not, then that's certainly moving the goal posts a very long way > in terms of this effort. During the triage of the patches submitted for the September CF [1] I noticed that the corresponding CF entry [2] has two threads linked. Only the first thread was used by CF bot [3], also Heikki and Thomas were listed as the authors. The patch from the first thread rotted and was not updated for some time which resulted in marking the patch as RwF for now [4] It looks like the patch in *this* thread was never registered on the commitfest and/or tested by CF bot, unless I'm missing something. Unfortunately it's a bit late to register it for the September CF especially considering the fact that it doesn't apply at the moment. This being said, please consider submitting the patch for the upcoming CF. Also, please consider joining the efforts and having one thread with a single patchset rather than different threads with different competing patches. This will simplify the work of the reviewers a lot. Personally I would suggest taking one step back and agree on a particular RFC first and then continue working on a single patchset according to this RFC. We did it in the past in similar cases and this approach proved to be productive. [1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org [2]: https://commitfest.postgresql.org/44/3514/ [3]: http://cfbot.cputube.org/ [4]: https://postgr.es/m/CAJ7c6TN%3D1EF1bTA6w8W%3D0e_Bj%2B-jsiHK0ap1uC_ZUGjwu%2B4JGw%40mail.gmail.com -- Best regards, Aleksander Alekseev
Re: Commitfest 2023-09 starts soon
On Thu, 31 Aug 2023 at 14:35, Aleksander Alekseev wrote: > > Hi, > > On Thu, 31 Aug 2023 at 11:37, Peter Eisentraut wrote: > > > There are a number of patches carried over from the PG16 development > > > cycle that have been in "Waiting on author" for several months. I will > > > aggressively prune those after the start of this commitfest if there > > > hasn't been any author activity by then. > > > > [1 patch] > > This was the one that I could name off the top of my head. > > [5 more patches] > > I will apply corresponding status changes if there will be no objections. On Mon, 4 Sept 2023 at [various], Aleksander Alekseev wrote: > > Hi, > > > [various patches] > > A consensus was reached [1] to mark this patch as RwF for now. There > are many patches to be reviewed and this one doesn't seem to be in the > best shape, so we have to prioritise. Please feel free re-submitting > the patch for the next commitfest. I'm a bit confused about your use of "consensus". True, there was no objection, but it looks like no patch author or reviewer was informed (cc-ed or directly referenced) that the patch was being discussed before achieving this "consensus", and the "consensus" was reached within 4 days, of which 2 weekend, in a thread that has (until now) involved only you and Peter E. Usually, you'd expect discussion about a patch to happen on the patch's thread before any action is taken (or at least a mention on that thread), but quite clearly that hasn't happened here. Are patch authors expected to follow any and all discussion on threads with "Commitfest" in the title? If so, shouldn't the relevant wiki pages be updated, and/or the -hackers community be updated by mail in a new thread about these policy changes? Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://wiki.postgresql.org/wiki/Submitting_a_Patch
Re: Query execution in Perl TAP tests needs work
On 2023-09-02 Sa 20:17, Thomas Munro wrote: On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan wrote: I confess I'm a little reluctant to impose this burden on buildfarm owners. We should think about some sort of fallback in case this isn't supported on some platform, either due to technological barriers or buildfarm owner reluctance. I guess you're thinking that it could be done in such a way that it is automatically used for $node->safe_psql() and various other things if Platypus is detected, but otherwise forking psql as now, for a transition period? Then we could nag build farm owners, and eventually turn off the fallback stuff after N months. After that it would begin to be possible to use this in more interesting and advanced ways. Yeah, that would be ideal. I'll prep a version of the module that doesn't fail if FFI::Platypus isn't available, but instead sets a flag we can test. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Suppressing compiler warning on Debian 12/gcc 12.2.0
On Thu, Aug 31, 2023 at 03:25:09PM -0400, Bruce Momjian wrote: > Being a new user of Debian 12/gcc 12.2.0, I wrote the following shell > script to conditionally add gmake rules with compiler flags to > src/Makefile.custom to suppress warnings for certain files. This allows > me to compile all supported Postgres releases without warnings. > > I actually didn't how simple it was to add per-file compile flags until > I read: > > > https://stackoverflow.com/questions/6546162/how-to-add-different-rules-for-specific-files This might be simpler for people to modify since it abstracts out the version checking. --- # gmake per-file options: https://stackoverflow.com/questions/6546162/how-to-add-different-rules-for-specific-files for FILE in configure.in configure.ac do if [ -e "$FILE" ] thenVERSION=$(sed -n 's/^AC_INIT(\[PostgreSQL\], \[\([0-9]\+\).*$/\1/p' "$FILE") break fi done [ -z "$VERSION" ] && echo 'Could not find Postgres version' 1>&2 && exit 1 if [ "$VERSION" -eq 11 ] thencat >> src/Makefile.custom <> src/Makefile.custom
Re: Create shorthand for including all extra tests
Nazir Bilal Yavuz writes: > I created an 'all' option for PG_TEST_EXTRA to enable all test suites > defined under PG_TEST_EXTRA. I think this is a seriously bad idea. The entire point of not including certain tests in check-world by default is that the omitted tests are security hazards, so a developer or buildfarm owner should review each one before deciding whether to activate it on their machine. regards, tom lane
Create shorthand for including all extra tests
Hi, I realized that I forgot to add the new extra test to my test scripts. So, I thought maybe we can use shorthand for including all extra tests. With that, running a full testsuite is easier without having to keep up with new tests and updates. I created an 'all' option for PG_TEST_EXTRA to enable all test suites defined under PG_TEST_EXTRA. I created the check_extra_tests_enabled() function in the Test/Utils.pm file. This function takes the test's name as an input and checks if PG_TEST_EXTRA contains 'all' or this test's name. I thought another advantage could be that this can be used in CI. But when 'wal_consistency_checking' is enabled, CI times get longer since it does resource intensive operations. Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From be91a0aaf926c83bf266f92e0523f41ca333b048 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 4 Sep 2023 16:58:42 +0300 Subject: [PATCH v1] Create shorthand for including all extra tests This patch aims to make running full testsuite easier without having to keep up with new tests and updates. Create 'all' option for PG_TEST_EXTRA to enable all test suites defined under PG_TEST_EXTRA. That is achieved by creating check_extra_tests_enabled() function in Test/Utils.pm file. This function takes the test's name as an input and checks if PG_TEST_EXTRA contains 'all' or this test's name. --- doc/src/sgml/regress.sgml| 9 + src/interfaces/libpq/t/004_load_balance_dns.pl | 2 +- src/test/kerberos/t/001_auth.pl | 2 +- src/test/ldap/t/001_auth.pl | 2 +- src/test/ldap/t/002_bindpasswd.pl| 2 +- src/test/modules/Makefile| 2 +- .../t/001_mutated_bindpasswd.pl | 2 +- src/test/perl/PostgreSQL/Test/Utils.pm | 16 src/test/recovery/t/027_stream_regress.pl| 3 +-- src/test/ssl/t/001_ssltests.pl | 2 +- src/test/ssl/t/002_scram.pl | 2 +- src/test/ssl/t/003_sslinfo.pl| 2 +- 12 files changed, 35 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index 675db86e4d7..40269c258ef 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -262,6 +262,15 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance' The following values are currently supported: + + all + + + Enables all extra tests. + + + + kerberos diff --git a/src/interfaces/libpq/t/004_load_balance_dns.pl b/src/interfaces/libpq/t/004_load_balance_dns.pl index 875070e2120..62eeb21843e 100644 --- a/src/interfaces/libpq/t/004_load_balance_dns.pl +++ b/src/interfaces/libpq/t/004_load_balance_dns.pl @@ -6,7 +6,7 @@ use PostgreSQL::Test::Utils; use PostgreSQL::Test::Cluster; use Test::More; -if ($ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/) +if (!PostgreSQL::Test::Utils::check_extra_text_enabled('load_balance')) { plan skip_all => 'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA'; diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl index 0deb9bffc8d..59574178afc 100644 --- a/src/test/kerberos/t/001_auth.pl +++ b/src/test/kerberos/t/001_auth.pl @@ -28,7 +28,7 @@ if ($ENV{with_gssapi} ne 'yes') { plan skip_all => 'GSSAPI/Kerberos not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('kerberos')) { plan skip_all => 'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA'; diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl index 3e113fd6ebb..9db07e801e9 100644 --- a/src/test/ldap/t/001_auth.pl +++ b/src/test/ldap/t/001_auth.pl @@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes') { plan skip_all => 'LDAP not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap')) { plan skip_all => 'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA'; diff --git a/src/test/ldap/t/002_bindpasswd.pl b/src/test/ldap/t/002_bindpasswd.pl index bcd4aa2b742..a1b1bd8c22f 100644 --- a/src/test/ldap/t/002_bindpasswd.pl +++ b/src/test/ldap/t/002_bindpasswd.pl @@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes') { plan skip_all => 'LDAP not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap')) { plan skip_all => 'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA'; diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 6331c976dcb..2fdcff24785 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -43,7 +43,7 @@ endif # Test runs an LDAP server, so only run if ldap is in PG_TEST_EXTRA ifeq ($(with_ldap),yes) -ifneq (,$(filt
Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases
Hi, On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada > wrote: > > Hi, > > > Thanks for reviewing the patch. Pushed. > > My colleague Vignesh noticed that the newly added test cases were failing in > BF animal sungazer[1]. Thank you for reporting! > > The test failed to drop the slot which is active on publisher. > > --error-log-- > This failure is because pg_drop_replication_slot fails with "replication slot > "test_tab2_sub" is active for PID 55771638": > 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement: SELECT > pg_drop_replication_slot('test_tab2_sub') > 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR: replication > slot "test_tab2_sub" is active for PID 55771638 > 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT: SELECT > pg_drop_replication_slot('test_tab2_sub') > - > > I the reason is that the test DISABLEd the subscription before dropping the > slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to > release the slot, so it's possible that the walsender is still alive when > calling > pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot() > doesn't wait for slot to be released). I agree with your analysis. > > I think we can wait for the slot to become inactive before dropping like: > $node_primary->poll_query_until('otherdb', > "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE > active_pid IS NOT NULL)" > ) > I prefer this approach but it would be better to specify the slot name in the query. > Or we can just don't drop the slot as it’s the last testcase. Since we might add other tests after that in the future, I think it's better to drop the replication slot (and subscription). > > Another thing might be worth considering is we can add one parameter in > pg_drop_replication_slot() to let user control whether to wait or not, and the > test can be fixed as well by passing nowait=false to the func. While it could be useful in general we cannot use this approach for this issue since it cannot be backpatched to older branches. We might want to discuss it on a new thread. I've attached a draft patch to fix this issue. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com fix.patch Description: Binary data
Re: Commitfest 2023-09 starts soon
Hi Peter, > The patch was first set to "Ready for Committer" on 2023-03-29, and if I > pull up the thread in the web archive view, that is in the middle of the > page. So as a committer, I would expect that someone would review > whatever happened in the second half of that thread before turning it > over to committer. > > As a general rule, if significant additional discussion or patch posting > happens after a patch is set to "Ready for Committer", I'm setting it > back to "Needs review" until someone actually re-reviews it. OK, fair enough. > I also notice that you are listed as both author and reviewer of that > patch, which I think shouldn't be done. It appears that you are in fact > the author, so I would recommend that you remove yourself from the > reviewers. Sometimes I start as a reviewer and then for instance add my own patches to the thread. In cases like this I end up being both an author and a reviewer, but it doesn't mean that I review my own patches :) In this particular case IMO it would be appropriate to remove myself from the list of reviewers. So I did. Thanks! -- Best regards, Aleksander Alekseev
Re: Extract numeric filed in JSONB more effectively
Hi, v13 attached. Changes includes: 1. fix the bug Jian provides. 2. reduce more code duplication without DirectFunctionCall. 3. add the overlooked jsonb_path_query and jsonb_path_query_first as candidates -- Best Regards Andy Fan v13-0001-optimize-casting-jsonb-to-a-given-type.patch Description: Binary data
Re: Commitfest 2023-09 starts soon
I have done several passes to make sure that patch statuses are more accurate. As explained in a nearby message, I have set several patches back from "Ready to Committer" to "Needs review" if additional discussion happened past the first status change. I have also in several cases removed reviewers from a patch entry if they haven't been active on the thread in several months. (Feel free to sign up again, but don't "block" the patch.) I was going to say, unfortunately that means that there is now even more work to do, but in fact, it seems this has all kind of balanced out, and I hope it's now more accurate for someone wanting to pick up some work: Status summary: Needs review: 224. Waiting on Author: 24. Ready for Committer: 31. Committed: 41. Returned with Feedback: 14. Withdrawn: 2. Rejected: 2. Total: 338. (And yes, the total number of patches has grown since the commitfest has started!?!) On 01.09.23 14:55, Peter Eisentraut wrote: Okay, here we go, starting with: Status summary: Needs review: 227. Waiting on Author: 37. Ready for Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1. Withdrawn: 1. Total: 337. (which is less than CF 2023-07) I have also already applied one round of the waiting-on-author-pruning described below (not included in the above figures). On 31.08.23 10:36, Peter Eisentraut wrote: Commitfest 2023-09 (https://commitfest.postgresql.org/44/) starts in less than 28 hours. If you have any patches you would like considered, be sure to add them in good time. All patch authors, and especially experienced hackers, are requested to make sure the patch status is up to date. If the patch is still being worked on, it might not need to be in "Needs review". Conversely, if you are hoping for a review but the status is "Waiting on author", then it will likely be ignored by reviewers. There are a number of patches carried over from the PG16 development cycle that have been in "Waiting on author" for several months. I will aggressively prune those after the start of this commitfest if there hasn't been any author activity by then.
Re: Commitfest 2023-09 starts soon
On 04.09.23 15:22, Aleksander Alekseev wrote: Hi Peter, Okay, here we go, starting with: Status summary: Needs review: 227. Waiting on Author: 37. Ready for Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1. Withdrawn: 1. Total: 337. (which is less than CF 2023-07) I have also already applied one round of the waiting-on-author-pruning described below (not included in the above figures). * Index SLRUs by 64-bit integers rather than by 32-bit integers https://commitfest.postgresql.org/44/3489/ The status here was changed to "Needs Review". These patches are in good shape and previously were marked as "Ready for Committer". Actually I thought Heikki would commit them to PG16, but it didn't happen. If there are no objections, I will return the RfC status in a bit since it seems to be more appropriate in this case. The patch was first set to "Ready for Committer" on 2023-03-29, and if I pull up the thread in the web archive view, that is in the middle of the page. So as a committer, I would expect that someone would review whatever happened in the second half of that thread before turning it over to committer. As a general rule, if significant additional discussion or patch posting happens after a patch is set to "Ready for Committer", I'm setting it back to "Needs review" until someone actually re-reviews it. I also notice that you are listed as both author and reviewer of that patch, which I think shouldn't be done. It appears that you are in fact the author, so I would recommend that you remove yourself from the reviewers.
Re: Unlogged relation copy is not fsync'd
On Fri, Aug 25, 2023 at 8:47 AM Heikki Linnakangas wrote: > > I noticed another missing fsync() with unlogged tables, similar to the > one at [1]. > > RelationCopyStorage does this: > > > /* > >* When we WAL-logged rel pages, we must nonetheless fsync them. The > >* reason is that since we're copying outside shared buffers, a > > CHECKPOINT > >* occurring during the copy has no way to flush the previously > > written > >* data to disk (indeed it won't know the new rel even exists). A > > crash > >* later on would replay WAL from the checkpoint, therefore it > > wouldn't > >* replay our earlier WAL entries. If we do not fsync those pages > > here, > >* they might still not be on disk when the crash occurs. > >*/ > > if (use_wal || copying_initfork) > > smgrimmedsync(dst, forkNum); > > That 'copying_initfork' condition is wrong. The first hint of that is > that 'use_wal' is always true for an init fork. I believe this was meant > to check for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise, > this bad thing can happen: > > 1. Create an unlogged table > 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls > RelationCopyStorage > 3. a checkpoint happens while the command is running > 4. After the ALTER TABLE has finished, shut down postgres cleanly. > 5. Lose power > > When you recover, the unlogged table is not reset, because it was a > clean postgres shutdown. But the part of the file that was copied after > the checkpoint at step 3 was never fsync'd, so part of the file is lost. > I was able to reproduce with a VM that I kill to simulate step 5. > > This is the same scenario that the smgrimmedsync() call above protects > from for WAL-logged relations. But we got the condition wrong: instead > of worrying about the init fork, we need to call smgrimmedsync() for all > *other* forks of an unlogged relation. > > Fortunately the fix is trivial, see attached. I suspect we have similar > problems in a few other places, though. end_heap_rewrite(), _bt_load(), > and gist_indexsortbuild have a similar-looking smgrimmedsync() calls, > with no consideration for unlogged relations at all. I haven't tested > those, but they look wrong to me. The patch looks reasonable to me. Is this [1] case in hash index build that I reported but didn't take the time to reproduce similar? - Melanie [1] https://www.postgresql.org/message-id/CAAKRu_bPc81M121pOEU7W%3D%2BwSWEebiLnrie4NpaFC%2BkWATFtSA%40mail.gmail.com
Re: Commitfest 2023-09 starts soon
Hi Peter, > Okay, here we go, starting with: > > Status summary: Needs review: 227. Waiting on Author: 37. Ready for > Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1. > Withdrawn: 1. Total: 337. > > (which is less than CF 2023-07) > > I have also already applied one round of the waiting-on-author-pruning > described below (not included in the above figures). * Index SLRUs by 64-bit integers rather than by 32-bit integers https://commitfest.postgresql.org/44/3489/ The status here was changed to "Needs Review". These patches are in good shape and previously were marked as "Ready for Committer". Actually I thought Heikki would commit them to PG16, but it didn't happen. If there are no objections, I will return the RfC status in a bit since it seems to be more appropriate in this case. -- Best regards, Aleksander Alekseev
Re: pgsql: Allow tailoring of ICU locales with custom rules
On Tue, Aug 22, 2023 at 10:55 PM Jeff Davis wrote: > > On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote: > > I have investigated this. My assessment is that how PostgreSQL > > interfaces with ICU is correct. Whether what ICU does is correct > > might > > be debatable. I have filed a bug with ICU about this: > > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no > > response yet. > > Is everything other than the language and region simply discarded when > a rules string is present, or are some attributes preserved, or is > there some other nuance? > > > You can work around this by including the desired attributes in the > > rules string, for example > > > > create collation c3 (provider=icu, > >locale='und-u-ka-shifted-ks-level1', > >rules='[alternate shifted][strength 1]', > >deterministic=false); > > > > So I don't think there is anything we need to do here for PostgreSQL > > 16. > > Is there some way we can warn a user that some attributes will be > discarded, or improve the documentation? Letting the user figure this > out for themselves doesn't seem right. > > Are you sure we want to allow rules for the database default collation > in 16, or should we start with just allowing them in CREATE COLLATION > and then expand to the database default collation later? I'm still a > bit concerned about users getting too fancy with daticurules, and > ending up not being able to connect to their database anymore. > There is still an Open Item corresponding to this. Does anyone else want to weigh in? -- With Regards, Amit Kapila.
Re: Speaker Bureau
Added my name to the list. I am available to travel for meetups. -- Best Wishes, Ashutosh Bapat On Sat, Sep 2, 2023 at 12:30 AM Dave Cramer wrote: > > Greetings, > > If you are on the speaker list can you send an email to ugc...@postgresql.us > indicating whether you are available to travel for meetups? > > This serves the obvious purpose but also provides your email address to us. > > Thanks, > > Dave Cramer
Re: trying again to get incremental backup
On Wed, Aug 30, 2023 at 9:20 PM Robert Haas wrote: > Unless someone has a brilliant idea that I lack, this suggests to me > that this whole line of testing is a dead end. I can, of course, write > tests that compare clusters *logically* -- do the correct relations > exist, are they accessible, do they have the right contents? Can't we think of comparing at the block level, like we can compare each block but ignore the content of the hole? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases
On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada wrote: Hi, > Thanks for reviewing the patch. Pushed. My colleague Vignesh noticed that the newly added test cases were failing in BF animal sungazer[1]. The test failed to drop the slot which is active on publisher. --error-log-- This failure is because pg_drop_replication_slot fails with "replication slot "test_tab2_sub" is active for PID 55771638": 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement: SELECT pg_drop_replication_slot('test_tab2_sub') 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR: replication slot "test_tab2_sub" is active for PID 55771638 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT: SELECT pg_drop_replication_slot('test_tab2_sub') - I the reason is that the test DISABLEd the subscription before dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to release the slot, so it's possible that the walsender is still alive when calling pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot() doesn't wait for slot to be released). I think we can wait for the slot to become inactive before dropping like: $node_primary->poll_query_until('otherdb', "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE active_pid IS NOT NULL)" ) Or we can just don't drop the slot as it’s the last testcase. Another thing might be worth considering is we can add one parameter in pg_drop_replication_slot() to let user control whether to wait or not, and the test can be fixed as well by passing nowait=false to the func. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2023-09-02%2002%3A17%3A01 Best Regards, Hou zj
Re: A minor adjustment to get_cheapest_path_for_pathkeys
Hi, >> I see the reasoning behind the proposed change, but I'm not convinced >> that there will be any measurable performance improvements. Firstly, >> compare_path_costs() is rather cheap. Secondly, require_parallel_safe >> is `false` in most of the cases. Last but not least, one should prove >> that this particular place is a bottleneck under given loads. I doubt >> it is. Most of the time it's a network, disk I/O or locks. >> >> So unless the author can provide benchmarks that show measurable >> benefits of the change I suggest rejecting it. > > Hmm, I doubt that there would be any measurable performance gains from > this minor tweak. I think this tweak is more about being cosmetic. But > I'm OK if it is deemed unnecessary and thus rejected. During the triage of the patches submitted for the September CF a consensus was reached [1] to mark this patch as Rejected. [1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org -- Best regards, Aleksander Alekseev
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Sat, Sep 2, 2023 at 1:41 AM Robert Haas wrote: > > On Fri, Sep 1, 2023 at 4:04 PM Jeff Davis wrote: > > On Thu, 2023-08-31 at 17:17 -0400, Joe Conway wrote: > > > Maybe move postgres_fdw to be a first class built in feature instead > > > of > > > an extension? > > > > That could make sense, but we still have to solve the problem of how to > > present a built-in FDW. > > > > FDWs don't have a schema, so it can't be inside pg_catalog. So we'd > > need some special logic somewhere to make pg_dump and psql \dew work as > > expected, and I'm not quite sure what to do there. > > I'm worried that an approach based on postgres_fdw would have security > problems. I think that we don't want postgres_fdw installed in every > PostgreSQL cluster for security reasons. And I think that the set of > people who should be permitted to manage connection strings for > logical replication subscriptions could be different from the set of > people who are entitled to use postgres_fdw. If postgres_fdw was the only way to specify a connection to be used with subscriptions, what you are saying makes sense. But it's not. We will continue to support current mechanism which doesn't require postgres_fdw to be installed on every PostgreSQL cluster. What security problems do you foresee if postgres_fdw is used in addition to the current mechanism? -- Best Wishes, Ashutosh Bapat
Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)
On 30.06.23 20:48, Karina Litskevich wrote: So as for me calling LockRelationForExtension() and UnlockRelationForExtension() without testing eb.rel first looks more like a bug here. However, they are never actually called with eb.rel=NULL because of the EB_* flags, so there is no bug here. I believe we should add Assert checking that when eb.rel is NULL, flags are such that we won't use eb.rel. And yes, we can remove unnecessary checks where the flags value guaranty us that eb.rel is not NULL. And another minor observation. It seems to me that we don't need a "could have been closed while waiting for lock" in ExtendBufferedRelShared(), because I believe the comment above already explains why updating eb.smgr: * Note that another backend might have extended the relation by the time * we get the lock. I attached the new version of the patch as I see it. This patch version looks the most sensible to me. But as commented further downthread, some explanation around the added assertions would be good.
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Sat, Sep 2, 2023 at 12:24 AM Jeff Davis wrote: > > On Fri, 2023-09-01 at 12:28 +0530, Ashutosh Bapat wrote: > > Thinking larger, how about we allow any FDW to be used here. > > That's a possibility, but I think that means the subscription would > need to constantly re-check the parameters rather than relying on the > FDW's validator. > > Otherwise it might be the wrong kind of FDW, and the user might be able > to circumvent the password_required protection. It might not even be a > postgres-related FDW at all, which would be a bit strange. > > If it's constantly re-checking the parameters then it raises the > possibility that some "ALTER SERVER" or "ALTER USER MAPPING" succeeds > but then subscriptions to that foreign server start failing, which > would not be ideal. But I could be fine with that. Why do we need to re-check parameters constantly? We will need to restart subscriptions which are using the user mapping of FDW when user mapping or server options change. If that mechanism isn't there, we will need to build it. But that's doable. I didn't understand your worry about circumventing password_required protection. > > > But I think there's some value in bringing > > together these two subsystems which deal with foreign data logically > > (as in logical vs physical view of data). > > I still don't understand how a core dependency on an extension would > work. We don't need to if we allow any FDW (even if non-postgreSQL) to be specified there. For non-postgresql FDW the receiver will need to construct the appropriate command and use appropriate protocol to get the changes and apply locally. The server at the other end may not even have logical replication capability. The extension or "input plugin" (as against output plugin) would decide whether it can deal with the foreign server specific logical replication protocol. We add another callback to FDW which can return whether the given foreign server supports logical replication or not. If the users misconfigured, their subscriptions will throw errors. But with this change we open a built-in way to "replicate in" as we have today to "replicate out". -- Best Wishes, Ashutosh Bapat
Re: SLRUs in the main buffer pool, redux
Hi, > Unfortunately the patchset rotted quite a bit since February and needs a > rebase. A consensus was reached [1] to mark this patch as RwF for now. There are many patches to be reviewed and this one doesn't seem to be in the best shape, so we have to prioritise. Please feel free re-submitting the patch for the next commitfest. [1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org -- Best regards, Aleksander Alekseev
Re: [PATCH] Infinite loop while acquiring new TOAST Oid
Hi, > Aleksander, thank you for reminding me of this patch, try to do it in a few > days. A consensus was reached [1] to mark this patch as RwF for now. There are many patches to be reviewed and this one doesn't seem to be in the best shape, so we have to prioritise. Please feel free re-submitting the patch for the next commitfest. [1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org -- Best regards, Aleksander Alekseev
Re: Flush SLRU counters in checkpointer process
Hi, > I think I've managed to reproduce the issue. The test I've added to check > slru flush was the one failing in the regression suite. A consensus was reached [1] to mark this patch as RwF for now. There are many patches to be reviewed and this one doesn't seem to be in the best shape, so we have to prioritise. Please feel free re-submitting the patch for the next commitfest. [1]: https://postgr.es/m/0737f444-59bb-ac1d-2753-873c40da0840%40eisentraut.org -- Best regards, Aleksander Alekseev
Re: proposal: psql: show current user in prompt
On Sun, 3 Sept 2023 at 20:58, Pavel Stehule wrote: > here is an try Overall it does what I had in mind. Below a few suggestions: +int +PQprotocolSubversion(const PGconn *conn) Ugh, it's quite annoying that the original PQprotocolVersion only returns the major version and thus we need this new function. It seems like it would be much nicer if it returned a number similar to PQserverVersion. I think it might be nicer to change PQprotocolVersion to do that than to add another function. We could do: return PG_PROTOCOL_MAJOR(conn->pversion) * 100 + PG_PROTOCOL_MINOR(conn->pversion); or even: if (PG_PROTOCOL_MAJOR(conn->pversion) == 3 && PG_PROTOCOL_MINOR(conn->pversion)) return 3; else return PG_PROTOCOL_MAJOR(conn->pversion) * 100 + PG_PROTOCOL_MINOR(conn->pversion); The second option would be safest backwards compatibility wise, but in practice you would only get another value than 3 (or 0) when connecting to pre 7.4 servers. That seems old enough that I don't think anyone is actually calling this function. **I'd like some feedback from others on this though.** + /* The protocol 3.0 is required */ + if (PG_PROTOCOL_MAJOR(their_version) == 3) + conn->pversion = their_version; Let's compare against the actual PG_PROTOCOL_EARLIEST and PG_PROTOCOL_LATEST to determine if the version is supported or not.
Re: Autogenerate some wait events code and documentation
Hi, On 8/29/23 8:41 AM, Michael Paquier wrote: On Tue, Aug 29, 2023 at 08:17:10AM +0200, Drouvot, Bertrand wrote: That may be a bug in the matrix because of bb90022, as git am can be easily pissed. git am does not complain anymore. + # Generate the element name for the enums based on the + # description. The C symbols are prefixed with "WAIT_EVENT_". Nit: 2 whitespaces before "The C" # Build the descriptions. There are in camel-case. # LWLock and Lock classes do not need any modifications. Nit: 2 whitespaces before "There are in camel" + my $waiteventdescription = ''; + if ( $waitclassname eq 'WaitEventLWLock' Nit: Too many whitespace after the "if (" ?? (I guess pgperltidy would fix it). I have double-checked the docs generated, while on it, and I am not seeing anything missing, particularly for the LWLock and Lock parts.. I did compare the output of select * from pg_wait_events order by 1,2 and ignored the case (with and without the patch series). Then, the only diff is: < Client,WalSenderWaitWal,Waiting for WAL to be flushed in WAL sender process --- Client,WalSenderWaitForWAL,Waiting for WAL to be flushed in WAL sender process That said, it looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Extract numeric filed in JSONB more effectively
Hi Jian, SELECT (test_json -> 'field1')::int4 FROM test_jsonb WHERE json_type > = 'object'; > -ERROR: cannot cast jsonb string to type integer > +ERROR: unknown jsonb type: 1125096840 > Thanks for the report! The reason is I return the address of a local variable. jsonb_object_field_start(PG_FUNCTION_ARGS) { JsonbValue *v; JsonbValue vbuf; v = getKeyJsonValueFromContainer(&jb->root, VARDATA_ANY(key),\ VARSIZE_ANY_EXHDR(key), &vbuf); PG_RETURN_POINTER(v); } Here the v points to vbuf which is a local variable in stack. I'm confused that why it works on my local machine and also works in the most queries in cfbot, the fix is below v = getKeyJsonValueFromContainer(&jb->root, VARDATA_ANY(key),\ VARSIZE_ANY_EXHDR(key), NULL); I will send an updated version soon. -- Best Regards Andy Fan
Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)
Em dom., 3 de set. de 2023 às 22:01, Michael Paquier escreveu: > On Sat, Sep 02, 2023 at 09:13:11AM -0300, Ranier Vilela wrote: > > I tried to keep the same behavior as before. > > Note that if the locale equals COLLPROVIDER_LIBC, > > the message to the user will be the same. > > -/* shouldn't happen */ > -elog(ERROR, "unsupported collprovider: %c", locale->provider); > +elog(ERROR, "collprovider '%c' does not support pg_strnxfrm_prefix()", > + locale->provider); > > Based on what's written here, these messages could be better because > full sentences are not encouraged in error messages, even for > non-translated elogs: > https://www.postgresql.org/docs/current/error-style-guide.html > > Perhaps something like "could not use collprovider %c: no support for > %s", where the function name is used, would be more consistent. > Sure. I have no objection to the wording of the message. If there is consensus, I can send another patch. best regards, Ranier Vilela
Re: cataloguing NOT NULL constraints
Looking at your 0001 patch, which adds tests for some of the information_schema views, I think it's a bad idea to put them in whatever other regression .sql files; they would be subject to concurrent changes depending on what other tests are being executed in the same parallel test. I suggest to put them all in a separate .sql file, and schedule that to run in the last concurrent group, together with the tablespace test. This way, it would capture all the objects left over by other test files. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Optimize planner memory consumption for huge arrays
On Mon, Sep 4, 2023 at 3:49 PM Lepikhov Andrei wrote: > > > + Const *c = makeConst(nominal_element_type, > > + -1, > > + nominal_element_collation, > > + elmlen, > > + elem_values[i], > > + elem_nulls[i], > > + elmbyval); > > + > > + args = list_make2(leftop, c); > > if (is_join_clause) > > s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc, > > clause->inputcollid, > > @@ -1984,7 +1985,8 @@ scalararraysel(PlannerInfo *root, > > ObjectIdGetDatum(operator), > > PointerGetDatum(args), > > Int32GetDatum(varRelid))); > > - > > + list_free(args); > > + pfree(c); > > > > Maybe you can just use list_free_deep, instead of storing the constant > > in a separate variable. > As I see, the first element in the array is leftop, which is used on other > iterations. So, we can't use list_free_deep here. Yeah you are right, thanks. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
[PATCH] Add inline comments to the pg_hba_file_rules view
Hi, This patch proposes the column "comment" to the pg_hba_file_rules view. It basically parses the inline comment (if any) of a valid pg_hba.conf entry and displays it in the new column. For such pg_hba entries ... host db jim 127.0.0.1/32 md5 # foo host db jim 127.0.0.1/32 md5 #bar host db jim 127.0.0.1/32 md5 # #foo# ... it returns the following pg_hba_file_rules records: postgres=# SELECT type, database, user_name, address, comment FROM pg_hba_file_rules WHERE user_name[1]='jim'; type | database | user_name | address | comment --+--+---+---+- host | {db} | {jim} | 127.0.0.1 | foo host | {db} | {jim} | 127.0.0.1 | bar host | {db} | {jim} | 127.0.0.1 | #foo# (3 rows) This feature can come in quite handy when we need to read important comments from the hba entries without having access to the pg_hba.conf file directly. The patch slightly changes the test 004_file_inclusion.pl to accommodate the new column and the hba comments. Discussion: https://www.postgresql.org/message-id/flat/3fec6550-93b0-b542-b203-b0054aaee83b%40uni-muenster.de Best regards, Jim From bb795fae29a0f714c590f94176a4675d7ae85a2f Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Mon, 4 Sep 2023 12:32:04 +0200 Subject: [PATCH v1] Add inline comments to the pg_hba_file_rules view This patch proposes the column "comment" to the pg_hba_file_rules view. It basically parses the inline comment (if any) of a valid pg_hba.conf entry and displays it in the new column. The patch slightly changes the test 004_file_inclusion.pl to accomodate the new column and the hba comments. The view's documentation at system-views.sgml is extended accordingly. The function GetInlineComment() was added to conffiles.c to avoid adding more complexity directly to hba.c. It also enables this feature to be used by other configuration files, if necessary. --- doc/src/sgml/system-views.sgml| 9 + src/backend/utils/adt/hbafuncs.c | 11 +- src/backend/utils/misc/conffiles.c| 26 + src/include/catalog/pg_proc.dat | 6 +-- src/include/libpq/hba.h | 1 + src/include/utils/conffiles.h | 1 + .../authentication/t/004_file_inclusion.pl| 39 --- src/test/regress/expected/rules.out | 3 +- 8 files changed, 85 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 2b35c2f91b..d4951372a5 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -1090,6 +1090,15 @@ + + + comment text + + + Text after the # comment character in the end of a valid pg_hba.conf entry, if any + + + error text diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c index 73d3ad1dad..929678e97e 100644 --- a/src/backend/utils/adt/hbafuncs.c +++ b/src/backend/utils/adt/hbafuncs.c @@ -22,6 +22,7 @@ #include "utils/array.h" #include "utils/builtins.h" #include "utils/guc.h" +#include "utils/conffiles.h" static ArrayType *get_hba_options(HbaLine *hba); @@ -159,7 +160,7 @@ get_hba_options(HbaLine *hba) } /* Number of columns in pg_hba_file_rules view */ -#define NUM_PG_HBA_FILE_RULES_ATTS 11 +#define NUM_PG_HBA_FILE_RULES_ATTS 12 /* * fill_hba_line @@ -191,6 +192,7 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, const char *addrstr; const char *maskstr; ArrayType *options; + char *comment; Assert(tupdesc->natts == NUM_PG_HBA_FILE_RULES_ATTS); @@ -346,6 +348,13 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, values[index++] = PointerGetDatum(options); else nulls[index++] = true; + + /* comments */ + comment = GetInlineComment(hba->rawline); + if(comment) + values[index++] = CStringGetTextDatum(comment); + else + nulls[index++] = true; } else { diff --git a/src/backend/utils/misc/conffiles.c b/src/backend/utils/misc/conffiles.c index 376a5c885b..0f4169dea3 100644 --- a/src/backend/utils/misc/conffiles.c +++ b/src/backend/utils/misc/conffiles.c @@ -25,6 +25,32 @@ #include "storage/fd.h" #include "utils/conffiles.h" +/* + * GetInlineComment + * + * This function returns comments of a given config file line, + * if there is any. A comment is any text after the # character. + */ +char * +GetInlineComment(char *line) +{ + size_t len; + char *comment = strchr(line, '#'); + + if(!comment) + return NULL; + + /* ignoring '#' character, as we don't need it in the comment itself. */ + comment++; + len = strlen(comment); + + /* trim leading and trailing whitespaces */ + while(isspace(comment[len - 1])) --len; + while(*comment && isspace(*comment)) ++comment, --len; + + return pnstrdup(comment, len); +} + /* * AbsoluteConfigLocation * diff --git a/src/include/catalog/pg_proc.dat b/src/inc
Re: Impact of checkpointer during pg_upgrade
On Mon, Sep 4, 2023 at 1:41 PM Dilip Kumar wrote: > > > > I think we can do better, like we can just read the latest > > > checkpoint's LSN before starting the old cluster. And now while > > > checking the slot can't we check if the the slot is invalidated then > > > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved > > > before starting the cluster because if so then those slot might have > > > got invalidated during the upgrade no? > > > > > > > Isn't that possible only if we update confirmend_flush LSN while > > invalidating? Otherwise, how the check you are proposing can succeed? > > I am not suggesting to compare the confirmend_flush_lsn to the latest > checkpoint LSN instead I am suggesting that before starting the > cluster we get the location of the latest checkpoint LSN that should > be the shutdown checkpoint LSN. So now also in [1] we check that > confirmed flush lsn should be equal to the latest checkpoint lsn. So > the only problem is that after we restart the cluster during the > upgrade we might invalidate some of the slots which are perfectly fine > to migrate and we want to identify those slots. So if we know the the > LSN of the shutdown checkpoint before the cluster started then we can > perform a additional checks on all the invalidated slots that their > confirmed lsn >= shutdown checkpoint lsn we preserved before > restarting the cluster (not the latest checkpoint lsn) then those > slots got invalidated only after we started the cluster for upgrade? > Is there any loophole in this theory? This theory is based on the > assumption that the confirmed flush lsn are not moving forward for the > already invalidated slots that means the slot which got invalidated > before we shutdown for upgrade will have confirm flush lsn value < > shutdown checkpoint and the slots which got invalidated during the > upgrade will have confirm flush lsn at least equal to the shutdown > checkpoint. Said that there is a possibility that some of the slots which got invalidated even on the previous checkpoint might get the same LSN as the slot which got invalidated later if there is no activity between these two checkpoints. So if we go with this approach then there is some risk of migrating some of the slots which were already invalidated even before the shutdown checkpoint. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: More new SQL/JSON item methods
> > Looking at the SQL standard itself, in the 2023 edition section 9.46 > "SQL/JSON path language: syntax and semantics", it shows this: > > ::= > type > | size > | double > | ceiling > | floor > | abs > | datetime [ ] > | keyvalue > | bigint > | boolean > | date > | decimal [ [ ] ] > | integer > | number > | string > | time [ ] > | time_tz [ ] > | timestamp [ ] > | timestamp_tz [ ] > > and then details, for each of those, rules like > > III) If JM specifies , then: > 1) For all j, 1 (one) ≤ j ≤ n, > Case: > a) If I_j is not a number or character string, then let ST be data >exception — non-numeric SQL/JSON item (22036). > b) Otherwise, let X be an SQL variable whose value is I_j. >Let V_j be the result of >CAST (X AS DOUBLE PRECISION) >If this conversion results in an exception condition, then >let ST be that exception condition. > 2) Case: > a) If ST is not successful completion, then the result of JAE >is ST. > b) Otherwise, the result of JAE is the SQL/JSON sequence V_1, >..., V_n. > > so at least superficially our implementation is constrained by what the > SQL standard says to do, and we should verify that this implementation > matches those rules. We don't necessarily need to watch what do other > specs such as jsonpath itself. > I believe our current implementation of the .double() method is in line with this. And these new methods are following the same suit. > > - surely there's a more direct way to make boolean from numeric > > than to serialize the numeric and parse an int? > Yeah, we can directly check the value = 0 for false, true otherwise. But looking at the PostgreSQL conversion to bool, it doesn't allow floating point values to be converted to boolean and only accepts int4. That's why I did the int4 conversion. Thanks -- Jeevan Chalke *Senior Staff SDE, Database Architect, and ManagerProduct Development* edbpostgres.com
Re: Optimize planner memory consumption for huge arrays
On Mon, Sep 4, 2023, at 3:37 PM, Dilip Kumar wrote: > On Mon, Sep 4, 2023 at 11:58 AM Lepikhov Andrei > wrote: >> >> Hi, hackers, >> >> Looking at the planner behaviour with the memory consumption patch [1], I >> figured out that arrays increase memory consumption by the optimizer >> significantly. See init.sql in attachment. >> The point here is that the planner does small memory allocations for each >> element during estimation. As a result, it looks like the planner consumes >> about 250 bytes for each integer element. >> >> It is maybe not a problem most of the time. However, in the case of >> partitions, memory consumption multiplies by each partition. Such a corner >> case looks weird, but the fix is simple. So, why not? >> >> The diff in the attachment is proof of concept showing how to reduce wasting >> of memory. Having benchmarked a bit, I didn't find any overhead. > > + Const *c = makeConst(nominal_element_type, > + -1, > + nominal_element_collation, > + elmlen, > + elem_values[i], > + elem_nulls[i], > + elmbyval); > + > + args = list_make2(leftop, c); > if (is_join_clause) > s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc, > clause->inputcollid, > @@ -1984,7 +1985,8 @@ scalararraysel(PlannerInfo *root, > ObjectIdGetDatum(operator), > PointerGetDatum(args), > Int32GetDatum(varRelid))); > - > + list_free(args); > + pfree(c); > > Maybe you can just use list_free_deep, instead of storing the constant > in a separate variable. As I see, the first element in the array is leftop, which is used on other iterations. So, we can't use list_free_deep here. -- Regards, Andrei Lepikhov
Re: persist logical slots to disk during shutdown checkpoint
On Mon, 4 Sept 2023 at 15:20, Amit Kapila wrote: > > On Fri, Sep 1, 2023 at 10:50 AM vignesh C wrote: > > > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila wrote: > > > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > > wrote: > > > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila > > > > wrote: > > > > > > > > > > All but one. Normally, the idea of marking dirty is to indicate that > > > > > we will actually write/flush the contents at a later point (except > > > > > when required for correctness) as even indicated in the comments atop > > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use > > > > > that protocol at all the current places including CreateSlotOnDisk(). > > > > > So, we can probably do it here as well. > > > > > > > > yes > > > > > > > > > > I think we should also ensure that slots are not invalidated > > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty > > > because we don't allow decoding from such slots, so we shouldn't > > > include those. > > > > Added this check. > > > > Apart from this I have also fixed the following issues that were > > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots > > instead of setting it in SaveSlotToPath > > > > + if (is_shutdown && SlotIsLogical(s)) > + { > + SpinLockAcquire(&s->mutex); > + if (s->data.invalidated == RS_INVAL_NONE && > + s->data.confirmed_flush != s->last_saved_confirmed_flush) > + s->dirty = true; > > I think it is better to use ReplicationSlotMarkDirty() as that would > be consistent with all other usages. ReplicationSlotMarkDirty works only on MyReplicationSlot whereas CheckpointReplicationSlots loops through all the slots and marks the appropriate slot as dirty, we might have to change ReplicationSlotMarkDirty to take the slot as input parameter and all caller should pass MyReplicationSlot. Another thing is we have already taken spin lock to access last_confirmed_flush_lsn from CheckpointReplicationSlots, we could set dirty flag here itself, else we will have to release the lock and call ReplicationSlotMarkDirty which will take lock again. Instead shall we set just_dirtied also in CheckpointReplicationSlots? Thoughts? Regards, Vignesh
Re: persist logical slots to disk during shutdown checkpoint
On Fri, Sep 1, 2023 at 10:50 AM vignesh C wrote: > > On Fri, 1 Sept 2023 at 10:06, Amit Kapila wrote: > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > wrote: > > > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila > > > wrote: > > > > > > > > All but one. Normally, the idea of marking dirty is to indicate that > > > > we will actually write/flush the contents at a later point (except > > > > when required for correctness) as even indicated in the comments atop > > > > ReplicatioinSlotMarkDirty(). However, I see your point that we use > > > > that protocol at all the current places including CreateSlotOnDisk(). > > > > So, we can probably do it here as well. > > > > > > yes > > > > > > > I think we should also ensure that slots are not invalidated > > (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty > > because we don't allow decoding from such slots, so we shouldn't > > include those. > > Added this check. > > Apart from this I have also fixed the following issues that were > agreed on: a) Setting slots to dirty in CheckPointReplicationSlots > instead of setting it in SaveSlotToPath > + if (is_shutdown && SlotIsLogical(s)) + { + SpinLockAcquire(&s->mutex); + if (s->data.invalidated == RS_INVAL_NONE && + s->data.confirmed_flush != s->last_saved_confirmed_flush) + s->dirty = true; I think it is better to use ReplicationSlotMarkDirty() as that would be consistent with all other usages. -- With Regards, Amit Kapila.
Re: Incremental View Maintenance, take 2
On Sat, Sep 2, 2023 at 7:46 PM Tatsuo Ishii wrote: > > > attached is my refactor. there is some whitespace errors in the > > patches, you need use > > git apply --reject --whitespace=fix > > basedon_v29_matview_c_refactor_update_set_clause.patch > > > > Also you patch cannot use git apply, i finally found out bulk apply > > I have no problem with applying Yugo's v29 patches using git apply, no > white space errors. > thanks. I downloaded the patches from the postgres website, then the problem was solved. other ideas based on v29. src/include/utils/rel.h 680: #define RelationIsIVM(relation) ((relation)->rd_rel->relisivm) I guess it would be better to add some comments to address the usage. Since all peer macros all have some comments. pg_class change, I guess we need bump CATALOG_VERSION_NO? small issue. makeIvmAggColumn and calc_delta need to add an empty return statement? style issue. in gram.y, "incremental" upper case? + CREATE OptNoLog incremental MATERIALIZED VIEW create_mv_target AS SelectStmt opt_with_data I don't know how pgident works, do you need to add some keywords to src/tools/pgindent/typedefs.list to make indentation work? in /* If this is not the last AFTER trigger call, immediately exit. */ Assert (entry->before_trig_count >= entry->after_trig_count); if (entry->before_trig_count != entry->after_trig_count) return PointerGetDatum(NULL); before returning NULL, do you also need clean_up_IVM_hash_entry? (I don't know when this case will happen) in /* Replace the modified table with the new delta table and calculate the new view delta*/ replace_rte_with_delta(rte, table, true, queryEnv); refresh_matview_datafill(dest_new, query, queryEnv, tupdesc_new, ""); replace_rte_with_delta does not change the argument: table, argument: queryEnv. refresh_matview_datafill just uses the partial argument of the function calc_delta. So I guess, I am confused by the usage of replace_rte_with_delta. also I think it should return void, since you just modify the input argument. Here refresh_matview_datafill is just persisting new delta content to dest_new?
Re: pg_upgrade and logical replication
On Mon, Sep 4, 2023 at 12:15 PM Michael Paquier wrote: > > On Mon, Sep 04, 2023 at 11:51:14AM +0530, Amit Kapila wrote: > > +1 for doing it via function (something like > > binary_upgrade_create_sub_rel_state). We already have the internal > > function AddSubscriptionRelState() that can do the core work. > > It is one of these patches that I have let aside for too long, and it > solves a use-case of its own. I think that I could hack that pretty > quickly given that Julien has done a bunch of the ground work. Would > you agree with that? > Yeah, I agree that could be hacked quickly but note I haven't reviewed in detail if there are other design issues in this patch. Note that we thought first to support the upgrade of the publisher node, otherwise, immediately after upgrading the subscriber and publisher, the subscriptions won't work and start giving errors as they are dependent on slots in the publisher. One other point that needs some thought is that the LSN positions we are going to copy in the catalog may no longer be valid after the upgrade (of the publisher) because we reset WAL. Does that need some special consideration or are we okay with that in all cases? As of now, things are quite safe as documented in pg_dump doc page that it will be the user's responsibility to set up replication after dump/restore. I think it would be really helpful if you could share your thoughts on the publisher-side matter as we are facing a few tricky questions to be answered. For example, see a new thread [1]. > > Like the publisher-side upgrade patch [1], I think we should allow > > upgrading subscriptions by default instead with some flag like > > --preserve-subscription-state. If required, we can introduce --exclude > > option for upgrade. Having it just for pg_dump sounds reasonable to > > me. > > > > [1] - > > https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com > > In the interface of the publisher for pg_upgrade agreed on and set in > stone? I certainly agree to have a consistent upgrade experience for > the two sides of logical replication, publications and subscriptions. > Also, I'd rather have a filtering option at the same time as the > upgrade option to give more control to users from the start. > The point raised by Jonathan for not having an option for pg_upgrade is that it will be easy for users, otherwise, users always need to enable this option. Consider a replication setup, wouldn't users want by default it to be upgraded? Asking them to do that via an option would be an inconvenience. So, that was the reason, we wanted to have an --exclude option and by default allow slots to be upgraded. I think the same theory applies here. [1] - https://www.postgresql.org/message-id/CAA4eK1LV3%2B76CSOAk0h8Kv0AKb-OETsJHe6Sq6172-7DZXf0Qg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: In-placre persistance change of a relation
At Thu, 24 Aug 2023 11:22:32 +0900 (JST), Kyotaro Horiguchi wrote in > I could turn this into something like undo longs in a simple form, but > I'd rather not craft a general-purpose undo log system for this unelss > it's absolutely necessary. This is a patch for a basic undo log implementation. It looks like it works well for some orphan-files-after-a-crash and data-loss-on-reinit cases. However, it is far from complete and likely has issues with crash-safety and the durability of undo log files (and memory leaks and performance and..). I'm posting this to move the discussion forward. (This doesn't contain the third file "ALTER TABLE ..ALL IN TABLESPACE" part.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From da5696b9026fa916ae991f7da616062c5b19e705 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 31 Aug 2023 11:49:10 +0900 Subject: [PATCH v29 1/2] Introduce undo log implementation This patch adds a simple implementation of UNDO log feature. --- src/backend/access/transam/Makefile| 1 + src/backend/access/transam/rmgr.c | 4 +- src/backend/access/transam/simpleundolog.c | 343 + src/backend/access/transam/twophase.c | 3 + src/backend/access/transam/xact.c | 24 ++ src/backend/access/transam/xlog.c | 20 +- src/backend/catalog/storage.c | 171 ++ src/backend/storage/file/reinit.c | 78 + src/backend/storage/smgr/smgr.c| 9 + src/bin/initdb/initdb.c| 17 + src/bin/pg_rewind/parsexlog.c | 2 +- src/bin/pg_waldump/rmgrdesc.c | 2 +- src/include/access/rmgr.h | 2 +- src/include/access/rmgrlist.h | 44 +-- src/include/access/simpleundolog.h | 36 +++ src/include/catalog/storage.h | 3 + src/include/catalog/storage_ulog.h | 35 +++ src/include/catalog/storage_xlog.h | 9 + src/include/storage/reinit.h | 2 + src/include/storage/smgr.h | 1 + src/tools/pgindent/typedefs.list | 6 + 21 files changed, 780 insertions(+), 32 deletions(-) create mode 100644 src/backend/access/transam/simpleundolog.c create mode 100644 src/include/access/simpleundolog.h create mode 100644 src/include/catalog/storage_ulog.h diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile index 661c55a9db..531505cbbd 100644 --- a/src/backend/access/transam/Makefile +++ b/src/backend/access/transam/Makefile @@ -21,6 +21,7 @@ OBJS = \ rmgr.o \ slru.o \ subtrans.o \ + simpleundolog.o \ timeline.o \ transam.o \ twophase.o \ diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c index 7d67eda5f7..840cbdecd3 100644 --- a/src/backend/access/transam/rmgr.c +++ b/src/backend/access/transam/rmgr.c @@ -35,8 +35,8 @@ #include "utils/relmapper.h" /* must be kept in sync with RmgrData definition in xlog_internal.h */ -#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \ - { name, redo, desc, identify, startup, cleanup, mask, decode }, +#define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode,undo) \ + { name, redo, desc, identify, startup, cleanup, mask, decode}, RmgrData RmgrTable[RM_MAX_ID + 1] = { #include "access/rmgrlist.h" diff --git a/src/backend/access/transam/simpleundolog.c b/src/backend/access/transam/simpleundolog.c new file mode 100644 index 00..ebbacce298 --- /dev/null +++ b/src/backend/access/transam/simpleundolog.c @@ -0,0 +1,343 @@ +/*- + * + * simpleundolog.c + * Simple implementation of PostgreSQL transaction-undo-log manager + * + * In this module, procedures required during a transaction abort are + * logged. Persisting this information becomes crucial, particularly for + * ensuring reliable post-processing during the restart following a transaction + * crash. At present, in this module, logging of information is performed by + * simply appending data to a created file. + * + * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/backend/access/transam/clog.c + * + *- + */ + +#include "postgres.h" + +#include "access/simpleundolog.h" +#include "access/twophase_rmgr.h" +#include "access/parallel.h" +#include "access/xact.h" +#include "catalog/storage_ulog.h" +#include "storage/fd.h" + +#define ULOG_FILE_MAGIC 0x12345678 + +typedef struct UndoLogFileHeader +{ + int32 magic; + bool prepared; +} UndoLogFileHeader; + +typedef struct UndoDescData +{ + const char *name; + void (*rm_undo) (SimpleUndoLogRecord *record, bool prepared); +} UndoDescData; + +/* must be kept in sync with RmgrData definition in xlog_internal.h */ +#define PG_RMGR(sy
Re: Optimize planner memory consumption for huge arrays
On Mon, Sep 4, 2023 at 11:58 AM Lepikhov Andrei wrote: > > Hi, hackers, > > Looking at the planner behaviour with the memory consumption patch [1], I > figured out that arrays increase memory consumption by the optimizer > significantly. See init.sql in attachment. > The point here is that the planner does small memory allocations for each > element during estimation. As a result, it looks like the planner consumes > about 250 bytes for each integer element. > > It is maybe not a problem most of the time. However, in the case of > partitions, memory consumption multiplies by each partition. Such a corner > case looks weird, but the fix is simple. So, why not? > > The diff in the attachment is proof of concept showing how to reduce wasting > of memory. Having benchmarked a bit, I didn't find any overhead. + Const *c = makeConst(nominal_element_type, + -1, + nominal_element_collation, + elmlen, + elem_values[i], + elem_nulls[i], + elmbyval); + + args = list_make2(leftop, c); if (is_join_clause) s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc, clause->inputcollid, @@ -1984,7 +1985,8 @@ scalararraysel(PlannerInfo *root, ObjectIdGetDatum(operator), PointerGetDatum(args), Int32GetDatum(varRelid))); - + list_free(args); + pfree(c); Maybe you can just use list_free_deep, instead of storing the constant in a separate variable. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Impact of checkpointer during pg_upgrade
On Mon, Sep 4, 2023 at 11:18 AM Amit Kapila wrote: > > On Mon, Sep 4, 2023 at 10:33 AM Dilip Kumar wrote: > > > > On Mon, Sep 4, 2023 at 8:41 AM Amit Kapila wrote: > > > > > > On Sat, Sep 2, 2023 at 6:12 PM Dilip Kumar wrote: > > > > > > > > On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila > > > > wrote: > > > > > > > > > The other possibilities apart from not allowing an upgrade in such a > > > > > case could be (a) Before starting the old cluster, we fetch the slots > > > > > directly from the disk using some tool like [2] and make the decisions > > > > > based on that state; > > > > > > > > Okay, so IIUC along with dumping the slot data we also need to dump > > > > the latest checkpoint LSN because during upgrade we do check that the > > > > confirmed flush lsn for all the slots should be the same as the latest > > > > checkpoint. Yeah but I think we could work this out. > > > > > > > We already have the latest checkpoint LSN information from > > > pg_controldata. I think we can use that as the patch proposed in the > > > thread [1] is doing now. Do you have something else in mind? > > > > I think I did not understood the complete proposal. And what I meant > > is that if we dump the slot before we start the cluster thats fine. > > But then later after starting the old cluster if we run some query > > like we do in check_old_cluster_for_valid_slots() then thats not > > right, because there is gap between the status of the slots what we > > dumped before starting the cluster and what we are checking after the > > cluster, so there is not point of that check right? > > > > That's right but if we do read slots from disk, we preserve those in > the memory and use that information instead of querying it again in > check_old_cluster_for_valid_slots(). > > > > > (b) During the upgrade, we don't allow WAL to be > > > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > > > as well but for that, we need to expose an API to invalidate the > > > > > slots; > > > > > > > > (d) somehow distinguish the slots that are invalidated during > > > > > an upgrade and then simply copy such slots because anyway we ensure > > > > > that all the WAL required by slot is sent before shutdown. > > > > > > > > Yeah this could also be an option, although we need to think the > > > > mechanism of distinguishing those slots looks clean and fit well with > > > > other architecture. > > > > > > > > > > If we want to do this we probably need to maintain a flag in the slot > > > indicating that it was invalidated during an upgrade and then use the > > > same flag in the upgrade to check the validity of slots. I think such > > > a flag needs to be maintained at the same level as > > > ReplicationSlotInvalidationCause to avoid any inconsistency among > > > those. > > > > I think we can do better, like we can just read the latest > > checkpoint's LSN before starting the old cluster. And now while > > checking the slot can't we check if the the slot is invalidated then > > their confirmed_flush_lsn >= the latest_checkpoint_lsn we preserved > > before starting the cluster because if so then those slot might have > > got invalidated during the upgrade no? > > > > Isn't that possible only if we update confirmend_flush LSN while > invalidating? Otherwise, how the check you are proposing can succeed? I am not suggesting to compare the confirmend_flush_lsn to the latest checkpoint LSN instead I am suggesting that before starting the cluster we get the location of the latest checkpoint LSN that should be the shutdown checkpoint LSN. So now also in [1] we check that confirmed flush lsn should be equal to the latest checkpoint lsn. So the only problem is that after we restart the cluster during the upgrade we might invalidate some of the slots which are perfectly fine to migrate and we want to identify those slots. So if we know the the LSN of the shutdown checkpoint before the cluster started then we can perform a additional checks on all the invalidated slots that their confirmed lsn >= shutdown checkpoint lsn we preserved before restarting the cluster (not the latest checkpoint lsn) then those slots got invalidated only after we started the cluster for upgrade? Is there any loophole in this theory? This theory is based on the assumption that the confirmed flush lsn are not moving forward for the already invalidated slots that means the slot which got invalidated before we shutdown for upgrade will have confirm flush lsn value < shutdown checkpoint and the slots which got invalidated during the upgrade will have confirm flush lsn at least equal to the shutdown checkpoint. [1] https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Unlogged relation copy is not fsync'd
On Fri, Aug 25, 2023 at 03:47:27PM +0300, Heikki Linnakangas wrote: > That 'copying_initfork' condition is wrong. The first hint of that is that > 'use_wal' is always true for an init fork. I believe this was meant to check > for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise, this bad thing > can happen: > > 1. Create an unlogged table > 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls > RelationCopyStorage > 3. a checkpoint happens while the command is running > 4. After the ALTER TABLE has finished, shut down postgres cleanly. > 5. Lose power > > When you recover, the unlogged table is not reset, because it was a clean > postgres shutdown. But the part of the file that was copied after the > checkpoint at step 3 was never fsync'd, so part of the file is lost. I was > able to reproduce with a VM that I kill to simulate step 5. Oops. The comment at the top of smgrimmedsync() looks incorrect now to me now. When copying the data from something else than an init fork, the relation pages are not WAL-logged for an unlogged relation. > Fortunately the fix is trivial, see attached. I suspect we have similar > problems in a few other places, though. end_heap_rewrite(), _bt_load(), and > gist_indexsortbuild have a similar-looking smgrimmedsync() calls, with no > consideration for unlogged relations at all. I haven't tested those, but > they look wrong to me. Oops ^ N. And end_heap_rewrite() mentions directly RelationCopyStorage().. -- Michael signature.asc Description: PGP signature