Re: [PoC] pg_upgrade: allow to upgrade publisher node
Hi Kuroda-san. Here are some review comments for v28-0003. == src/bin/pg_upgrade/check.c 1. check_and_dump_old_cluster + /* + * Logical replication slots can be migrated since PG17. See comments atop + * get_old_cluster_logical_slot_infos(). + */ + if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) + check_old_cluster_for_valid_slots(live_check); + IIUC we are preferring to use the <= 1600 style of version check instead of >= 1700 where possible. So this comment and version check ought to be removed from here, and done inside check_old_cluster_for_valid_slots() instead. ~~~ 2. check_old_cluster_for_valid_slots +/* + * check_old_cluster_for_valid_slots() + * + * Make sure logical replication slots can be migrated to new cluster. + * Following points are checked: + * + * - All logical replication slots are usable. + * - All logical replication slots consumed all WALs, except a + * CHECKPOINT_SHUTDOWN record. + */ +static void +check_old_cluster_for_valid_slots(bool live_check) I suggested in the previous comment above (#1) that the version check should be moved into this function. Therefore, this function comment now should also mention slot upgrade is only allowed for >= PG17 ~~~ 3. +static void +check_old_cluster_for_valid_slots(bool live_check) +{ + int i, + ntups, + i_slotname; + PGresult *res; + DbInfo*active_db = _cluster.dbarr.dbs[0]; + PGconn*conn; + + /* Quick exit if the cluster does not have logical slots. */ + if (count_old_cluster_logical_slots() == 0) + return; 3a. See comment #1. At the top of this function body there should be a version check like: if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) return; ~ 3b. /Quick exit/Quick return/ ~ 4. + prep_status("Checking for logical replication slots"); I felt that should add the word "valid" like: "Checking for valid logical replication slots" ~~~ 5. + /* Check there are no logical replication slots with a 'lost' state. */ + res = executeQueryOrDie(conn, + "SELECT slot_name FROM pg_catalog.pg_replication_slots " + "WHERE wal_status = 'lost' AND " + "temporary IS FALSE;"); Since the SQL is checking if there *are* lost slots I felt it would be more natural to reverse that comment. SUGGESTION /* Check and reject if there are any logical replication slots with a 'lost' state. */ ~~~ 6. + /* + * Do additional checks if a live check is not required. This requires + * that confirmed_flush_lsn of all the slots is the same as the latest + * checkpoint location, but it would be satisfied only when the server has + * been shut down. + */ + if (!live_check) I think the comment can be rearranged slightly: SUGGESTION Do additional checks to ensure that 'confirmed_flush_lsn' of all the slots is the same as the latest checkpoint location. Note: This can be satisfied only when the old_cluster has been shut down, so we skip this for "live" checks. == src/bin/pg_upgrade/controldata.c 7. + /* + * Read the latest checkpoint location if the cluster is PG17 + * or later. This is used for upgrading logical replication + * slots. + */ + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) + { Fetching this "Latest checkpoint location:" value is only needed for the check_old_cluster_for_valid_slots validation check, isn't it? But AFAICT this code is common for both old_cluster and new_cluster. I am not sure what is best to do: - Do only the minimal logic needed? - Read the value redundantly even for new_cluster just to keep code simpler? Either way, maybe the comment should say something about this. == .../t/003_logical_replication_slots.pl 8. Consider adding one more test Maybe there should also be some "live check" test performed (e.g. using --check, and a running old_cluster). This would demonstrate pg_upgrade working successfully even when the WAL records are not consumed (because LSN checks would be skipped in check_old_cluster_for_valid_slots function). -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Extract numeric filed in JSONB more effectively
(Sorry for leaving this discussion for such a long time, how times fly!) On Sun, Aug 27, 2023 at 6:28 AM Chapman Flack wrote: > On 2023-08-22 08:16, Chapman Flack wrote: > > On 2023-08-22 01:54, Andy Fan wrote: > >> After we label it, we will get error like this: > >> > >> select (a->'a')::int4 from m; > >> ERROR: cannot display a value of type internal > > > > Without looking in depth right now, I would double-check > > what relabel node is being applied at the result. The idea, > > of course, was to relabel the result as the expected result > > as I suspected, looking at v10-0003, there's this: > > + fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID, > + 0, InvalidOid, > COERCE_IMPLICIT_CAST); > > compared to the example I had sent earlier: > > On 2023-08-18 17:02, Chapman Flack wrote: > > expr = (Expr *)makeRelabelType((Expr *)fexpr, > > targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST); > > The key difference: this is the label going on the result type > of the function we are swapping in. I'm feeling we have some understanding gap in this area, let's see what it is. Suppose the original query is: numeric(jsonb_object_field(v_jsonb, text)) -> numeric. without the patch 003, the rewritten query is: jsonb_object_field_type(NUMERICOID, v_jsonb, text) -> NUMERIC. However the declared type of jsonb_object_field_type is: jsonb_object_field_type(internal, jsonb, text) -> internal. So the situation is: a). We input a CONST(type=OIDOID, ..) for an internal argument. b). We return a NUMERIC type which matches the original query c). result type NUMERIC doesn't match the declared type 'internal' d). it doesn't match the run-time type of internal argument which is OID. case a) is fixed by RelableType. case b) shouldn't be treat as an issue. I thought you wanted to address the case c), and patch 003 tries to fix it, then the ERROR above. At last I realized case c) isn't the one you want to fix. case d) shouldn't be requirement at the first place IIUC. So your new method is: 1. jsonb_{op}_start() -> internal (internal actually JsonbValue). 2. jsonb_finish_{type}(internal, ..) -> type. (internal is JsonbValue ). This avoids the case a) at the very beginning. I'd like to provides patches for both solutions for comparison. Any other benefits of this method I am missing? > Two more minor differences: (1) the node you get from > makeRelabelType is an Expr, but not really a FuncExpr. Casting > it to FuncExpr is a bit bogus. Also, the third argument to > makeRelabelType is a typmod, and I believe the "not-modified" > typmod is -1, not 0. > My fault, you are right. > > On 2023-08-21 06:50, Andy Fan wrote: > > I'm not very excited with this manner, reasons are: a). It will have > > to emit more steps in ExprState->steps which will be harmful for > > execution. The overhead is something I'm not willing to afford. > > I would be open to a performance comparison, but offhand I am not > sure whether the overhead of another step or two in an ExprState > is appreciably more than some of the overhead in the present patch, > such as the every-time-through fcinfo initialization buried in > DirectFunctionCall1 where you don't necessarily see it. I bet the fcinfo in an ExprState step gets set up once, and just has > new argument values slammed into it each time through. > fcinfo initialization in DirectFunctionCall1 is an interesting point! so I am persuaded the extra steps in ExprState may not be worse than the current way due to the "every-time-through fcinfo initialization" (in which case the memory is allocated once in heap rather than every time in stack). I can do a comparison at last to see if we can find some other interesting findings. > I would not underestimate the benefit of reducing the code > duplication and keeping the patch as clear as possible. > The key contributions of the patch are getting a numeric or > boolean efficiently out of the JSON operation. Getting from > numeric to int or float are things the system already does > well. True, reusing the casting system should be better than hard-code the casting function manually. I'd apply this on both methods. > A patch that focuses on what it contributes, and avoids > redoing things the system already can do--unless the duplication > can be shown to have a strong performance benefit--is easier to > review and probably to get integrated. > Agreed. At last, thanks for the great insights and patience! -- Best Regards Andy Fan
Re: Synchronizing slots from primary to standby
On Wed, Aug 23, 2023 at 4:21 PM Dilip Kumar wrote: > > On Wed, Aug 23, 2023 at 3:38 PM shveta malik wrote: > > > I have reviewed the v12-0002 patch and I have some comments. I see the > latest version posted sometime back and if any of this comment is > already fixed in this version then feel free to ignore that. > > In general, code still needs a lot more comments to make it readable > and in some places, code formatting is also not as per PG standard so > that needs to be improved. > There are some other specific comments as listed below > Please see the latest patch-set (v14). Did some code-formatting, used pg_indent as well. Added more comments. Let me know specifically if some more comments or formatting is needed. > 1. > @@ -925,7 +936,7 @@ ApplyLauncherRegister(void) > memset(, 0, sizeof(bgw)); > bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | > BGWORKER_BACKEND_DATABASE_CONNECTION; > - bgw.bgw_start_time = BgWorkerStart_RecoveryFinished; > + bgw.bgw_start_time = BgWorkerStart_ConsistentState; > > What is the reason for this change, can you add some comments? Sure, done. > > 2. > ApplyLauncherShmemInit(void) > { > bool found; > + bool foundSlotSync; > > Is there any specific reason to launch the sync worker from the > logical launcher instead of making this independent? > I mean in the future if we plan to sync physical slots as well then it > wouldn't be an expandable design. > > 3. > + /* > + * Remember the old dbids before we stop and cleanup this worker > + * as these will be needed in order to relaunch the worker. > + */ > + copied_dbcnt = worker->dbcount; > + copied_dbids = (Oid *)palloc0(worker->dbcount * sizeof(Oid)); > + > + for (i = 0; i < worker->dbcount; i++) > + copied_dbids[i] = worker->dbids[i]; > > probably we can just memcpy the memory containing the dbids. Done. > > 4. > + /* > + * If worker is being reused, and there is vacancy in dbids array, > + * just update dbids array and dbcount and we are done. > + * But if dbids array is exhausted, stop the worker, reallocate > + * dbids in dsm, relaunch the worker with same set of dbs as earlier > + * plus the new db. > + */ > > Why do we need to relaunch the worker, can't we just use dsa_pointer > to expand the shared memory whenever required? > Done. > 5. > > +static bool > +WaitForSlotSyncWorkerAttach(SlotSyncWorker *worker, > +uint16 generation, > +BackgroundWorkerHandle *handle) > > this function is an exact duplicate of WaitForReplicationWorkerAttach > except for the LWlock, why don't we use the same function by passing > the LWLock as a parameter > > 6. > +/* > + * Attach Slot-sync worker to worker-slot assigned by launcher. > + */ > +void > +slotsync_worker_attach(int slot) > > this is also very similar to the logicalrep_worker_attach function. > > Please check other similar functions and reuse them wherever possible > Will revisit these as stated in [1]. [1]: https://www.postgresql.org/message-id/CAJpy0uDeWjJj%2BU8nn%2BHbnGWkfY%2Bn-Bbw_kuHqgphETJ1Lucy%2BQ%40mail.gmail.com > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com
Re: persist logical slots to disk during shutdown checkpoint
On Wed, Aug 30, 2023 at 9:03 AM Julien Rouhaud wrote: > > On Tue, Aug 29, 2023 at 02:21:15PM +0530, Amit Kapila wrote: > > On Tue, Aug 29, 2023 at 10:16 AM vignesh C wrote: > > > > > > That makes sense. The attached v6 version has the changes for the > > > same, apart from this I have also fixed a) pgindent issues b) perltidy > > > issues c) one variable change (flush_lsn_changed to > > > confirmed_flush_has_changed) d) corrected few comments in the test > > > file. Thanks to Peter for providing few offline comments. > > > > > > > The latest version looks good to me. Julien, Ashutosh, and others, > > unless you have more comments or suggestions, I would like to push > > this in a day or two. > > Unfortunately I'm currently swamped with some internal escalations so I > couldn't keep up closely with the latest activity here. > > I think I recall that you wanted to > change the timing at which logical slots are shutdown, I'm assuming that this > change won't lead to always have a difference between the LSN and latest > persisted LSN being different? > I think here by LSN you are referring to confirmed_flush LSN. If so, this doesn't create any new difference between the values for the confirmed_flush LSN in memory and in disk. We just remember the last persisted value to avoid writes of slots at shutdown time. > Otherwise saving the latest persisted LSN to > try to avoid persisting again all logical slots on shutdown seems reasonable > to > me. Thanks for responding. -- With Regards, Amit Kapila.
Re: pg_stat_get_backend_subxact() and backend IDs?
On Tue, Aug 29, 2023 at 07:01:51PM -0700, Nathan Bossart wrote: > On Wed, Aug 30, 2023 at 08:22:27AM +0900, Michael Paquier wrote: >> +extern LocalPgBackendStatus *pgstat_get_local_beentry_by_index(int beid); >> >> Still I would to a bit more of s/beid/id/ for cases where the code >> refers to an index ID, and not a backend ID, especially for the >> internal routines. > > Makes sense. I did this in v4. Yep, that looks more consistent, at quick glance. -- Michael signature.asc Description: PGP signature
Re: Synchronizing slots from primary to standby
On Fri, Aug 25, 2023 at 11:09 AM shveta malik wrote: > > On Wed, Aug 23, 2023 at 4:21 PM Dilip Kumar wrote: > > > > On Wed, Aug 23, 2023 at 3:38 PM shveta malik wrote: > > > > > I have reviewed the v12-0002 patch and I have some comments. I see the > > latest version posted sometime back and if any of this comment is > > already fixed in this version then feel free to ignore that. > > > > Thanks for the feedback. Please find my comments on a few. I will work on > rest. > > > > 2. > > ApplyLauncherShmemInit(void) > > { > > bool found; > > + bool foundSlotSync; > > > > Is there any specific reason to launch the sync worker from the > > logical launcher instead of making this independent? > > I mean in the future if we plan to sync physical slots as well then it > > wouldn't be an expandable design. > > When we started working on this, it was reusing logical-apply worker > infra, so I separated it from logical-apply worker but let it be > managed by a replication launcher considering that only logical slots > needed to be synced. I think this needs more thought and I would like > to know from others as well before concluding anything here. > > > > 5. > > > > +static bool > > +WaitForSlotSyncWorkerAttach(SlotSyncWorker *worker, > > +uint16 generation, > > +BackgroundWorkerHandle *handle) > > > > this function is an exact duplicate of WaitForReplicationWorkerAttach > > except for the LWlock, why don't we use the same function by passing > > the LWLock as a parameter > > > > Here workers (first argument) are different. We can always pass > LWLock, but since workers are different, in order to merge the common > functionality, we need to have some common worker structure between > the two workers (apply and sync-slot) and pass that to functions which > need to be merged (similar to NodeTag used in Insert/CreateStmt etc). > But changing LogicalRepWorker() would mean changing > applyworker/table-sync worker/parallel-apply-worker files. Since there > are only two such functions which you pointed out (attach and > wait_for_attach), I prefered to keep the functions as is until we > conclude on where slot-sync worker functionality actually fits in. I > can revisit these comments then. Or if you see any better way to do > it, kindly let me know. > > > 6. > > +/* > > + * Attach Slot-sync worker to worker-slot assigned by launcher. > > + */ > > +void > > +slotsync_worker_attach(int slot) > > > > this is also very similar to the logicalrep_worker_attach function. > > > > Please check other similar functions and reuse them wherever possible > > > > Also, why this function is not registering the cleanup function on > > shmmem_exit? > > > > It is doing it in ReplSlotSyncMain() since we have dsm-seg there. > Please see this: > > /* Primary initialization is complete. Now, attach to our slot. */ > slotsync_worker_attach(worker_slot); > before_shmem_exit(slotsync_worker_detach, PointerGetDatum(seg)); > PFA new patch-set which attempts to fix these: a) Synced slots on standby are not consumable i.e. pg_logical_slot_get/peek_changes will give error on these while will work on user-created slots. b) User created slots on standby will not be dropped by slot-sync workers anymore. Earlier slot-sync worker was dropping all the slots which were not part of synchronize_slot_names. c) Now DSA is being used for dbids to facilitate memory extension if required without needing to restart the worker. Earlier dsm was used alone which needed restart of the worker in case the memory allocated needs to be extended. Changes are in patch 0002. Next in pipeline: 1. Handling of corner scenarios which I explained in: https://www.postgresql.org/message-id/CAJpy0uC%2B2agRtF3H%3Dn-hW5JkoPfaZkjPXJr%3D%3Dy3_PRE04dQvhw%40mail.gmail.com 2. Revisiting comments (older ones in this thread and latest given) for patch 1. thanks Shveta v14-0001-Allow-logical-walsenders-to-wait-for-physical-st.patch Description: Binary data v14-0002-Add-logical-slot-sync-capability-to-physical-sta.patch Description: Binary data
Re: persist logical slots to disk during shutdown checkpoint
Hi, On Tue, Aug 29, 2023 at 02:21:15PM +0530, Amit Kapila wrote: > On Tue, Aug 29, 2023 at 10:16 AM vignesh C wrote: > > > > That makes sense. The attached v6 version has the changes for the > > same, apart from this I have also fixed a) pgindent issues b) perltidy > > issues c) one variable change (flush_lsn_changed to > > confirmed_flush_has_changed) d) corrected few comments in the test > > file. Thanks to Peter for providing few offline comments. > > > > The latest version looks good to me. Julien, Ashutosh, and others, > unless you have more comments or suggestions, I would like to push > this in a day or two. Unfortunately I'm currently swamped with some internal escalations so I couldn't keep up closely with the latest activity here. I think I recall that you wanted to change the timing at which logical slots are shutdown, I'm assuming that this change won't lead to always have a difference between the LSN and latest persisted LSN being different? Otherwise saving the latest persisted LSN to try to avoid persisting again all logical slots on shutdown seems reasonable to me.
Re: Debian 12 gcc warning
Bruce Momjian writes: > On Tue, Aug 29, 2023 at 10:18:36AM -0400, Tom Lane wrote: >> That seems like a pretty clear compiler bug, particularly since it just >> appears in this one version. Rather than contorting our code, I'd >> suggest filing a gcc bug. > I assume I have to create a test case to report this to the gcc team. I > tried to create such a test case on gcc 12 but it doesn't generate the > warning. Attached is my attempt. Any ideas? I assume we can't just > tell them to download our software and compile it. IIRC, they'll accept preprocessed compiler input for the specific file; you don't need to provide a complete source tree. Per https://gcc.gnu.org/bugs/ Please include all of the following items, the first three of which can be obtained from the output of gcc -v: the exact version of GCC; the system type; the options given when GCC was configured/built; the complete command line that triggers the bug; the compiler output (error messages, warnings, etc.); and the preprocessed file (*.i*) that triggers the bug, generated by adding -save-temps to the complete compilation command, or, in the case of a bug report for the GNAT front end, a complete set of source files (see below). Obviously, if you can trim the input it's good, but it doesn't have to be a minimal reproducer. regards, tom lane
Re: [PoC] pg_upgrade: allow to upgrade publisher node
Here are some minor review comments for patch v28-0002 == src/sgml/ref/pgupgrade.sgml 1. - with the primary.) Replication slots are not copied and must - be recreated. + with the primary.) Replication slots on old standby are not copied. + Only logical slots on the primary are migrated to the new standby, + and other slots must be recreated. /on old standby/on the old standby/ == src/bin/pg_upgrade/info.c 2. get_old_cluster_logical_slot_infos +void +get_old_cluster_logical_slot_infos(void) +{ + int dbnum; + + /* Logical slots can be migrated since PG17. */ + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) + return; + + pg_log(PG_VERBOSE, "\nsource databases:"); + + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) + { + DbInfo*pDbInfo = _cluster.dbarr.dbs[dbnum]; + + get_old_cluster_logical_slot_infos_per_db(pDbInfo); + + if (log_opts.verbose) + { + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name); + print_slot_infos(>slot_arr); + } + } +} It might be worth putting an Assert before calling the get_old_cluster_logical_slot_infos_per_db(...) just as a sanity check: Assert(pDbInfo->slot_arr.nslots == 0); This also helps to better document the "Note" of the count_old_cluster_logical_slots() function comment. ~~~ 3. count_old_cluster_logical_slots +/* + * count_old_cluster_logical_slots() + * + * Sum up and return the number of logical replication slots for all databases. + * + * Note: this function always returns 0 if the old_cluster is PG16 and prior + * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and + * later. + */ +int +count_old_cluster_logical_slots(void) Maybe that "Note" should be expanded a bit to say who does this: SUGGESTION Note: This function always returns 0 if the old_cluster is PG16 and prior because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and later. See where get_old_cluster_logical_slot_infos_per_db() is called. == src/bin/pg_upgrade/pg_upgrade.c 4. + /* + * Logical replication slot upgrade only supported for old_cluster >= + * PG17. + * + * Note: This must be done after doing the pg_resetwal command because + * pg_resetwal would remove required WALs. + */ + if (count_old_cluster_logical_slots()) + { + start_postmaster(_cluster, true); + create_logical_replication_slots(); + stop_postmaster(false); + } + 4a. I felt this comment needs a bit more detail otherwise you can't tell how the >= PG17 version check works. 4b. /slot upgrade only supported/slot upgrade is only supported/ ~ SUGGESTION Logical replication slot upgrade is only supported for old_cluster >= PG17. An explicit version check is not necessary here because function count_old_cluster_logical_slots() will always return 0 for old_cluster <= PG16. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Missing comments/docs about custom scan path
On Tue, Aug 29, 2023 at 5:08 PM Etsuro Fujita wrote: > Another thing I would like to propose is minor adjustments to the docs > related to parallel query: > > A custom scan provider will typically add paths for a base relation by > setting the following hook, which is called after the core code has > generated all the access paths it can for the relation (except for > Gather paths, which are made after this call so that they can use > partial paths added by the hook): > > For clarity, I think "except for Gather paths" should be "except for > Gather and Gather Merge paths". > > Although this hook function can be used to examine, modify, or remove > paths generated by the core system, a custom scan provider will > typically confine itself to generating CustomPath objects and adding > them to rel using add_path. > > For clarity, I think "adding them to rel using add_path" should be eg, > "adding them to rel using add_path, or using add_partial_path if they > are partial paths". +1. I can see that this change makes the doc more consistent with the comments in set_rel_pathlist. Thanks Richard
Re: pg_stat_get_backend_subxact() and backend IDs?
On Wed, Aug 30, 2023 at 08:22:27AM +0900, Michael Paquier wrote: > On Tue, Aug 29, 2023 at 09:46:55AM -0700, Nathan Bossart wrote: >> This was my first reaction [0]. I was concerned about renaming the >> exported functions so close to release, so I was suggesting that we hold >> off on that part until v17. If there isn't a concern with renaming these >> functions in v16, I can proceed with something more like v2. > > Thanks for the pointer. This version is much better IMO, because it > removes entirely the source of the confusion between the difference in > backend ID and index ID treatments when fetching the local entries in > the array. So I'm okay to rename these functions now, before .0 is > released to get things in a better shape while addressing the issue > reported. Okay. > +extern LocalPgBackendStatus *pgstat_get_local_beentry_by_index(int beid); > > Still I would to a bit more of s/beid/id/ for cases where the code > refers to an index ID, and not a backend ID, especially for the > internal routines. Makes sense. I did this in v4. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 3e360c1b970526079528159fe1c49be03d0b13b5 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 24 Aug 2023 07:42:54 -0700 Subject: [PATCH v4 1/2] rename some pgstat functions --- src/backend/utils/activity/backend_status.c | 28 src/backend/utils/adt/pgstatfuncs.c | 36 ++--- src/include/utils/backend_status.h | 4 +-- 3 files changed, 34 insertions(+), 34 deletions(-) diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 38f91a495b..a4860b10fc 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -849,8 +849,8 @@ pgstat_read_current_status(void) /* * The BackendStatusArray index is exactly the BackendId of the * source backend. Note that this means localBackendStatusTable - * is in order by backend_id. pgstat_fetch_stat_beentry() depends - * on that. + * is in order by backend_id. pgstat_get_beentry_by_backend_id() + * depends on that. */ localentry->backend_id = i; BackendIdGetTransactionIds(i, @@ -1073,21 +1073,21 @@ cmp_lbestatus(const void *a, const void *b) } /* -- - * pgstat_fetch_stat_beentry() - + * pgstat_get_beentry_by_backend_id() - * * Support function for the SQL-callable pgstat* functions. Returns * our local copy of the current-activity entry for one backend, * or NULL if the given beid doesn't identify any known session. * * The beid argument is the BackendId of the desired session - * (note that this is unlike pgstat_fetch_stat_local_beentry()). + * (note that this is unlike pgstat_get_local_beentry_by_index()). * * NB: caller is responsible for a check if the user is permitted to see * this info (especially the querystring). * -- */ PgBackendStatus * -pgstat_fetch_stat_beentry(BackendId beid) +pgstat_get_beentry_by_backend_id(BackendId beid) { LocalPgBackendStatus key; LocalPgBackendStatus *ret; @@ -,13 +,13 @@ pgstat_fetch_stat_beentry(BackendId beid) /* -- - * pgstat_fetch_stat_local_beentry() - + * pgstat_get_local_beentry_by_index() - * - * Like pgstat_fetch_stat_beentry() but with locally computed additions (like - * xid and xmin values of the backend) + * Like pgstat_get_beentry_by_backend_id() but with locally computed additions + * (like xid and xmin values of the backend) * - * The beid argument is a 1-based index in the localBackendStatusTable - * (note that this is unlike pgstat_fetch_stat_beentry()). + * The idx argument is a 1-based index in the localBackendStatusTable + * (note that this is unlike pgstat_get_beentry_by_backend_id()). * Returns NULL if the argument is out of range (no current caller does that). * * NB: caller is responsible for a check if the user is permitted to see @@ -1125,14 +1125,14 @@ pgstat_fetch_stat_beentry(BackendId beid) * -- */ LocalPgBackendStatus * -pgstat_fetch_stat_local_beentry(int beid) +pgstat_get_local_beentry_by_index(int idx) { pgstat_read_current_status(); - if (beid < 1 || beid > localNumBackends) + if (idx < 1 || idx > localNumBackends) return NULL; - return [beid - 1]; + return [idx - 1]; } @@ -1141,7 +1141,7 @@ pgstat_fetch_stat_local_beentry(int beid) * * Support function for the SQL-callable pgstat* functions. Returns * the number of sessions known in the localBackendStatusTable, i.e. - * the maximum 1-based index to pass to pgstat_fetch_stat_local_beentry(). + * the maximum 1-based index to pass to pgstat_get_local_beentry_by_index(). * -- */ int diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 2b9742ad21..49cc887b97 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -211,7 +211,7 @@
Re: pgbench: allow to exit immediately when any client is aborted
On Wed, 30 Aug 2023 10:11:10 +0900 (JST) Tatsuo Ishii wrote: > Yugo, > Fabien, > > >>> I start to think this behavior is ok and consistent with previous > >>> behavior of pgbench because serialization (and dealock) errors have > >>> been treated specially from other types of errors, such as accessing > >>> non existing tables. However, I suggest to add more sentences to the > >>> explanation of this option so that users are not confused by this. > >>> > >>> + > >>> + --exit-on-abort > >>> + > >>> + > >>> +Exit immediately when any client is aborted due to some error. > >>> Without > >>> +this option, even when a client is aborted, other clients could > >>> continue > >>> +their run as specified by -t or > >>> -T option, > >>> +and pgbench will print an incomplete > >>> results > >>> +in this case. > >>> + > >>> + > >>> + > >>> + > >>> > >>> What about inserting "Note that serialization failures or deadlock > >>> failures will not abort client. See >>> linkend="failures-and-retries"/> for more information." into the end > >>> of this paragraph? > >> > >> --exit-on-abort is related to "abort" of a client instead of error or > >> failure itself, so rather I wonder a bit that mentioning > >> serialization/deadlock > >> failures might be confusing. However, if users may think of such failures > >> from > >> "abort", it could be beneficial to add the sentences with some > >> modification as > >> below. > > > > I myself confused by this and believe that adding extra paragraph is > > beneficial to users. > > > >> "Note that serialization failures or deadlock failures does not abort the > >> client, so they are not affected by this option. > >> See for more information." > > > > "does not" --> "do not". > > > >>> BTW, I think: > >>> Exit immediately when any client is aborted due to some error. > >>> Without > >>> > >>> should be: > >>> Exit immediately when any client is aborted due to some errors. > >>> Without > >>> > >>> (error vs. erros) > >> > >> Well, I chose "some" to mean "unknown or unspecified", not "an unspecified > >> amount > >> or number of", so singular form "error" is used. > > > > Ok. > > > >> Instead, should we use "due to a error"? > > > > I don't think so. > > > >>> Also: > >>> + --exit-on-abort is specified . Otherwise in > >>> the worst > >>> > >>> There is an extra space between "specified" and ".". > >> > >> Fixed. > >> > >> Also, I fixed the place of the description in the documentation > >> to alphabetical order > >> > >> Attached is the updated patch. > > > > Looks good to me. If there's no objection, I will commit this next week. > > I have pushed the patch. Thank you for the conributions! Thank you! Regards, Yugo Nagata > > Best reagards, > -- > Tatsuo Ishii > SRA OSS LLC > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp -- Yugo NAGATA
Re: Fix shadow warnings in logical replication code
On Wed, Aug 30, 2023 at 8:49 AM Michael Paquier wrote: > There is much more going on with -Wshadow, but let's do things > incrementally, case by case. Yeah, IIRC the source tree currently is able to be built without any shadow-related warnings with -Wshadow=compatible-local. But with -Wshadow or -Wshadow=local, you can still see a lot of warnings. Thanks Richard
A propose to revise \watch help message
Recently \watch got the following help message. > \watch [[i=]SEC] [c=N] [m=MIN] > execute query every SEC seconds, up to N times > stop if less than MIN rows are returned The "m=MIN" can be a bit misleading. It may look like it's about interval or counts, but it actually refers to the row number, which is not spelled out in the line. Would it make sense to change it to MINROWS? There's enough room in the line for that change and the doc already uses min_rows. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 38c165a627..8c8616205d 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -200,9 +200,9 @@ slashUsage(unsigned short int pager) HELP0(" \\gset [PREFIX] execute query and store result in psql variables\n"); HELP0(" \\gx [(OPTIONS)] [FILE] as \\g, but forces expanded output mode\n"); HELP0(" \\q quit psql\n"); - HELP0(" \\watch [[i=]SEC] [c=N] [m=MIN]\n"); + HELP0(" \\watch [[i=]SEC] [c=N] [m=MINROWS]\n"); HELP0(" execute query every SEC seconds, up to N times\n"); - HELP0(" stop if less than MIN rows are returned\n"); + HELP0(" stop if less than MINROWS rows are returned\n"); HELP0("\n"); HELP0("Help\n");
Re: Wrong usage of pqMsg_Close message code?
On Wed, Aug 30, 2023 at 07:56:33AM +0900, Michael Paquier wrote: > On Tue, Aug 29, 2023 at 02:11:06PM -0700, Nathan Bossart wrote: >> I plan to commit the attached patch shortly. > > WFM. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: A propose to revise \watch help message
At Wed, 30 Aug 2023 10:21:26 +0900 (JST), Kyotaro Horiguchi wrote in > Recently \watch got the following help message. > > > \watch [[i=]SEC] [c=N] [m=MIN] > > execute query every SEC seconds, up to N times > > stop if less than MIN rows are returned > > The "m=MIN" can be a bit misleading. It may look like it's about > interval or counts, but it actually refers to the row number, which is > not spelled out in the line. > > Would it make sense to change it to MINROWS? There's enough room in > the line for that change and the doc already uses min_rows. Mmm. I noticed the continuation lines are indented too much, probably because of the backslash escape in the main line. The attached includes the fix for that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 38c165a627..2da79a75f5 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -200,9 +200,9 @@ slashUsage(unsigned short int pager) HELP0(" \\gset [PREFIX] execute query and store result in psql variables\n"); HELP0(" \\gx [(OPTIONS)] [FILE] as \\g, but forces expanded output mode\n"); HELP0(" \\q quit psql\n"); - HELP0(" \\watch [[i=]SEC] [c=N] [m=MIN]\n"); - HELP0(" execute query every SEC seconds, up to N times\n"); - HELP0(" stop if less than MIN rows are returned\n"); + HELP0(" \\watch [[i=]SEC] [c=N] [m=MINROWS]\n"); + HELP0(" execute query every SEC seconds, up to N times\n"); + HELP0(" stop if less than MINROWS rows are returned\n"); HELP0("\n"); HELP0("Help\n");
Re: should frontend tools use syncfs() ?
On Wed, Aug 30, 2023 at 09:10:47AM +0900, Michael Paquier wrote: > I understand that I'm perhaps sounding pedantic about fsync_pgdata().. > But, after thinking more about it, I would still make this code fail > hard with an exit(EXIT_FAILURE) to let any C code calling directly > this routine with sync_method = DATA_DIR_SYNC_METHOD_SYNCFS know that > the build does not allow the use of this option when we don't have > HAVE_SYNCFS. parse_sync_method() offers some protection, but adding > this restriction also in the execution path is more friendly than > falling back silently to the default of flushing each file if > fsync_pgdata() is called with syncfs but the build does not support > it. At least that's more predictible. That seems fair enough. I did this in v7. I restructured fsync_pgdata() and fsync_dir_recurse() so that any new sync methods should cause compiler warnings until they are implemented. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From f5ecb3c8b5d397c94a9f8ab9a053b3a0cfd7f854 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v7 1/2] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..54e9bfb8c4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components
Re: pgbench: allow to exit immediately when any client is aborted
Yugo, Fabien, >>> I start to think this behavior is ok and consistent with previous >>> behavior of pgbench because serialization (and dealock) errors have >>> been treated specially from other types of errors, such as accessing >>> non existing tables. However, I suggest to add more sentences to the >>> explanation of this option so that users are not confused by this. >>> >>> + >>> + --exit-on-abort >>> + >>> + >>> +Exit immediately when any client is aborted due to some error. >>> Without >>> +this option, even when a client is aborted, other clients could >>> continue >>> +their run as specified by -t or >>> -T option, >>> +and pgbench will print an incomplete >>> results >>> +in this case. >>> + >>> + >>> + >>> + >>> >>> What about inserting "Note that serialization failures or deadlock >>> failures will not abort client. See >> linkend="failures-and-retries"/> for more information." into the end >>> of this paragraph? >> >> --exit-on-abort is related to "abort" of a client instead of error or >> failure itself, so rather I wonder a bit that mentioning >> serialization/deadlock >> failures might be confusing. However, if users may think of such failures >> from >> "abort", it could be beneficial to add the sentences with some modification >> as >> below. > > I myself confused by this and believe that adding extra paragraph is > beneficial to users. > >> "Note that serialization failures or deadlock failures does not abort the >> client, so they are not affected by this option. >> See for more information." > > "does not" --> "do not". > >>> BTW, I think: >>> Exit immediately when any client is aborted due to some error. >>> Without >>> >>> should be: >>> Exit immediately when any client is aborted due to some errors. >>> Without >>> >>> (error vs. erros) >> >> Well, I chose "some" to mean "unknown or unspecified", not "an unspecified >> amount >> or number of", so singular form "error" is used. > > Ok. > >> Instead, should we use "due to a error"? > > I don't think so. > >>> Also: >>> + --exit-on-abort is specified . Otherwise in the >>> worst >>> >>> There is an extra space between "specified" and ".". >> >> Fixed. >> >> Also, I fixed the place of the description in the documentation >> to alphabetical order >> >> Attached is the updated patch. > > Looks good to me. If there's no objection, I will commit this next week. I have pushed the patch. Thank you for the conributions! Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Remove IndexInfo.ii_OpclassOptions field
On Tue, Aug 29, 2023 at 10:51:10AM +0200, Peter Eisentraut wrote: > At a glance, however, I think my patch is (a) not related, and (b) if it > were, it would probably *help*, because the change is to not allocate any > long-lived structures that no one needs and that might get out of date. Hmm, yeah, perhaps you're right about (b) here. I have a few other high-priority items for stable branches on my board before being able to look at all this in more details, unfortunately, so feel free to ignore me if you think that this is an improvement anyway even regarding the other issue discussed. -- Michael signature.asc Description: PGP signature
Re: Fix shadow warnings in logical replication code
On Wed, Aug 30, 2023 at 09:16:38AM +1000, Peter Smith wrote: > logicalfuncs.c:184:13: warning: declaration of ‘name’ shadows a > previous local [-Wshadow] > char*name = TextDatumGetCString(datum_opts[i]); > ^ > logicalfuncs.c:105:8: warning: shadowed declaration is here [-Wshadow] > Name name; A bit confusing here, particularly as the name is reused with ReplicationSlotAcquire() at the end of pg_logical_slot_get_changes_guts() once again. > reorderbuffer.c:4843:10: warning: declaration of ‘isnull’ shadows a > previous local [-Wshadow] > bool isnull; > ^ > reorderbuffer.c:4734:11: warning: shadowed declaration is here [-Wshadow] > bool*isnull; >^ Agreed as well about this one. > walsender.c:3543:14: warning: declaration of ‘sentPtr’ shadows a > global declaration [-Wshadow] >XLogRecPtr sentPtr; > ^ > walsender.c:155:19: warning: shadowed declaration is here [-Wshadow] > static XLogRecPtr sentPtr = InvalidXLogRecPtr; >^ This one looks pretty serious to me, particularly as the static sentPtr is used quite a bit. It is fortunate that the impact is limited to the WAL sender stat function. Fixing all these seems like a good thing in the long term, so OK for me. Like all the fixes similar to this one, I don't see a need for a backpatch based on their locality, even if sentPtr makes me a bit nervous to keep even in stable branches. There is much more going on with -Wshadow, but let's do things incrementally, case by case. -- Michael signature.asc Description: PGP signature
Re: Strange presentaion related to inheritance in \d+
At Tue, 29 Aug 2023 19:28:28 +0200, Alvaro Herrera wrote in > On 2023-Aug-29, Kyotaro Horiguchi wrote: > > > Attached is the initial version of the patch. It prevents "CREATE > > TABLE" from executing if there is an inconsisntent not-null > > constraint. Also I noticed that "ALTER TABLE t ADD NOT NULL c NO > > INHERIT" silently ignores the "NO INHERIT" part and fixed it. > > Great, thank you. I pushed it after modifying it a bit -- instead of > throwing the error in MergeAttributes, I did it in > AddRelationNotNullConstraints(). It seems cleaner this way, mostly > because we already have to match these two constraints there. (I guess I agree that it is cleaner. > you could argue that we waste catalog-insertion work before the error is > reported and the whole thing is aborted; but I don't think this is a > serious problem in practice.) Given the rarity and the speed required, I agree that early-catching is not that crucial here. Thanks for clearing that up. regardes. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Standardize spelling of "power of two"
At Tue, 29 Aug 2023 14:39:42 +0200, Daniel Gustafsson wrote in > > On 29 Aug 2023, at 13:11, Alvaro Herrera wrote: > > > > On 2023-Aug-29, Daniel Gustafsson wrote: > > > >> Agreed. While we have numerous "power of 2" these were the only ones > >> in translated user-facing messages, so I've pushed this to master (it > >> didn't seem worth disrupting translations for 16 as we are so close to > >> wrapping it, if others disagree I can backpatch). > > > > I'd rather backpatch it. There's only 5 translations that are 100% for > > initdb.po, and they have two weeks to make the change from "2" to "two". > > I don't think this is a problem. > > Fair enough, backpatched to v16. Thank you for committing this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: should frontend tools use syncfs() ?
On Tue, Aug 29, 2023 at 08:45:59AM -0700, Nathan Bossart wrote: > rebased 0001 looks OK, worth its own, independent, commit. I understand that I'm perhaps sounding pedantic about fsync_pgdata().. But, after thinking more about it, I would still make this code fail hard with an exit(EXIT_FAILURE) to let any C code calling directly this routine with sync_method = DATA_DIR_SYNC_METHOD_SYNCFS know that the build does not allow the use of this option when we don't have HAVE_SYNCFS. parse_sync_method() offers some protection, but adding this restriction also in the execution path is more friendly than falling back silently to the default of flushing each file if fsync_pgdata() is called with syncfs but the build does not support it. At least that's more predictible. I'm fine with the doc changes. -- Michael signature.asc Description: PGP signature
Re: pg_stat_get_backend_subxact() and backend IDs?
On Tue, Aug 29, 2023 at 09:46:55AM -0700, Nathan Bossart wrote: > This was my first reaction [0]. I was concerned about renaming the > exported functions so close to release, so I was suggesting that we hold > off on that part until v17. If there isn't a concern with renaming these > functions in v16, I can proceed with something more like v2. Thanks for the pointer. This version is much better IMO, because it removes entirely the source of the confusion between the difference in backend ID and index ID treatments when fetching the local entries in the array. So I'm okay to rename these functions now, before .0 is released to get things in a better shape while addressing the issue reported. +extern LocalPgBackendStatus *pgstat_get_local_beentry_by_index(int beid); Still I would to a bit more of s/beid/id/ for cases where the code refers to an index ID, and not a backend ID, especially for the internal routines. -- Michael signature.asc Description: PGP signature
Fix shadow warnings in logical replication code
The -Wshadow compiler option reported 3 shadow warnings within the logical replication files. (These are all in old code) PSA a patch to address those. == logicalfuncs.c:184:13: warning: declaration of ‘name’ shadows a previous local [-Wshadow] char*name = TextDatumGetCString(datum_opts[i]); ^ logicalfuncs.c:105:8: warning: shadowed declaration is here [-Wshadow] Name name; ^ ~~~ reorderbuffer.c:4843:10: warning: declaration of ‘isnull’ shadows a previous local [-Wshadow] bool isnull; ^ reorderbuffer.c:4734:11: warning: shadowed declaration is here [-Wshadow] bool*isnull; ^ ~~~ walsender.c:3543:14: warning: declaration of ‘sentPtr’ shadows a global declaration [-Wshadow] XLogRecPtr sentPtr; ^ walsender.c:155:19: warning: shadowed declaration is here [-Wshadow] static XLogRecPtr sentPtr = InvalidXLogRecPtr; ^ -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Remove-shadows-found-in-logical-replication-files.patch Description: Binary data
Re: Avoid a possible overflow (src/backend/utils/sort/logtape.c)
On Thu, Aug 24, 2023 at 07:02:40PM -0700, Peter Geoghegan wrote: > FWIW I'm pretty sure that it's impossible to run into problems here in > practice -- the minheap is allocated by palloc(), and the high > watermark number of free pages is pretty small. Even still, I agree > with your conclusion. There is really no reason to not be consistent > here. Postgres 16 RC1 is now tagged, so applied down to 13. -- Michael signature.asc Description: PGP signature
Re: Allow specifying a dbname in pg_basebackup connection string
Hi Jelte On 29.08.23 15:55, Jelte Fennema wrote: Thanks for the review. I've updated the documentation to make it clearer (using some of your suggestions but also some others) This patch applies and builds cleanly, and the documentation is very clear. I tested it using the 'replication-support' branch from your github fork: /pg_basebackup --dbname "port=6432 user=postgres dbname=foo" -D /tmp/dump1/ pgbouncer log: /2023-08-30 00:50:52.866 CEST [811770] LOG C-0x555fbd65bf40: (nodb)/postgres@unix(811776):6432 login attempt: db=foo user=postgres tls=no replication=yes/ However, pgbouncer closes with a segmentation fault, so I couldn't test the result of pg_basebackup itself - but I guess it isn't the issue here. Other than that, everything looks good to me. Jim
Re: Wrong usage of pqMsg_Close message code?
On Tue, Aug 29, 2023 at 02:11:06PM -0700, Nathan Bossart wrote: > I plan to commit the attached patch shortly. WFM. -- Michael signature.asc Description: PGP signature
Re: Autogenerate some wait events code and documentation
On Tue, Aug 29, 2023 at 02:21:48PM +0200, Alvaro Herrera wrote: > Yeah, I have a mild preference for keeping the prefix, but it's mild > because I also imagine that if somebody doesn't see the full symbol name > when grepping they will think to remove the prefix. So only -0.1. So, are you fine with the patch as presented? Or are there other things you'd like to see changed in the format? > I think the DOCONLY stuff should be better documented; they make no > sense without looking at the commit message for fa88928470b5. Good point. However, with 0002 in place these are gone. -- Michael signature.asc Description: PGP signature
Re: Is pg_regress --use-existing used by anyone or is it broken?
On Tue, Aug 29, 2023 at 3:37 PM Daniel Gustafsson wrote: > > It's handy when using pg_regress with a custom test suite, where I > > don't want to be nagged about disconnecting from the database every > > time. > > I'm curious about your workflow around it, it seems to me that it's kind of > broken so I wonder if we instead then should make it an equal citizen with > temp > instance? I'm confused. You seem to think that it's a problem that --use-existing doesn't create databases and roles. But that's the whole point, at least for me. I don't use --use-existing to run the standard regression tests, or anything like that. I use it to run my own custom test suite, often while relying upon the database having certain data already. Sometimes it's a nontrivial amount of data. I don't want to have to set up and tear down the data every time, since it isn't usually necessary. I usually have a relatively small and fast running read-only test suite, and a larger test suite that does indeed need to do various setup and teardown steps. It isn't possible to run the smaller test suite without having first run the larger one at least once. But this is just for me, during development. Right now, with my SAOP nbtree project, the smaller test suite takes me about 50ms to run, while the larger one takes almost 10 seconds. -- Peter Geoghegan
Re: Is pg_regress --use-existing used by anyone or is it broken?
> On 30 Aug 2023, at 00:33, Peter Geoghegan wrote: > > On Tue, Aug 29, 2023 at 2:53 PM Daniel Gustafsson wrote: >> Having looked a bit more on it I have a feeling that plain removing it would >> be >> the best option. Unless someone chimes in as a user of it I'll propose a >> patch >> to remove it. > > -1. I use it. Thanks for confirming! > It's handy when using pg_regress with a custom test suite, where I > don't want to be nagged about disconnecting from the database every > time. I'm curious about your workflow around it, it seems to me that it's kind of broken so I wonder if we instead then should make it an equal citizen with temp instance? -- Daniel Gustafsson
Re: Is pg_regress --use-existing used by anyone or is it broken?
On Tue, Aug 29, 2023 at 2:53 PM Daniel Gustafsson wrote: > Having looked a bit more on it I have a feeling that plain removing it would > be > the best option. Unless someone chimes in as a user of it I'll propose a > patch > to remove it. -1. I use it. It's handy when using pg_regress with a custom test suite, where I don't want to be nagged about disconnecting from the database every time. -- Peter Geoghegan
Re: Is pg_regress --use-existing used by anyone or is it broken?
> On 29 Aug 2023, at 23:38, Nathan Bossart wrote: > On Mon, Aug 28, 2023 at 03:11:15PM +0200, Daniel Gustafsson wrote: >> Does anyone here use it? > > I don't think I've ever used it. AFAICT it was added with hot standby mode > (efc16ea) to support 'make standbycheck', which was removed last year > (4483b2cf). Unless someone claims to be using it, it's probably fine to > just remove it. Having looked a bit more on it I have a feeling that plain removing it would be the best option. Unless someone chimes in as a user of it I'll propose a patch to remove it. -- Daniel Gustafsson
Re: Query execution in Perl TAP tests needs work
On Wed, Aug 30, 2023 at 1:49 AM Noah Misch wrote: > On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote: > > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch wrote: > > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 > > > > Interesting. But that shows a case with no pipes connected, using > > select() as a dumb sleep and ignoring SIGCHLD. In our usage we have > > pipes connected, and I think select() should return when the child's > > output pipes become readable due to EOF. I guess something about that > > might be b0rked on Windows? I see there is an extra helper process > > doing socket<->pipe conversion (hah, that explains an extra ~10ms at > > the start in my timestamps)... > > In that case, let's assume it's not the same issue. Yeah, I think it amounts to the same thing, if EOF never arrives. I suspect that we could get ->safe_psql() down to about ~25ms baseline if someone could fix the posited IPC::Run EOF bug and change its internal helper process to a helper thread. Even if we fix tests to reuse backends, I expect that'd help CI measurably. (The native way would be to use pipes directly, changing select() to WaitForMultipleObjects(), but I suspect IPC::Run might have painted itself into a corner by exposing the psuedo-pipes and documenting that they can be used with select(). Oh well.)
Re: Is pg_regress --use-existing used by anyone or is it broken?
On Mon, Aug 28, 2023 at 03:11:15PM +0200, Daniel Gustafsson wrote: > When looking at pg_regress I noticed that the --use-existing support didn't > seem to work. ISTM that the removal/creation of test databases and roles > doesn't run since the conditional is reversed. There is also no support for > using a non-default socket directory with PG_REGRESS_SOCK_DIR. The attached > hack fixes these and allows the tests to execute for me, but even with that > the > test_setup suite fails due to the tablespace not being dropped and recreated > like databases and roles. > > Is it me who is too thick to get it working, or is it indeed broken? If it's > the latter, it's been like that for a long time which seems to indicate that > it > isn't really used and should probably be removed rather than fixed? > > Does anyone here use it? I don't think I've ever used it. AFAICT it was added with hot standby mode (efc16ea) to support 'make standbycheck', which was removed last year (4483b2cf). Unless someone claims to be using it, it's probably fine to just remove it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Wrong usage of pqMsg_Close message code?
On Tue, Aug 29, 2023 at 09:15:55AM -0700, Nathan Bossart wrote: > Thanks for the report. I'll get this fixed up. My guess is that this was > leftover from an earlier version of the patch that used the same macro for > identical protocol characters. I plan to commit the attached patch shortly. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 3a895c081fb26a2f818d07c7d28a54b94c278391 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 29 Aug 2023 14:01:18 -0700 Subject: [PATCH v2 1/1] Fix misuse of PqMsg_Close. EndCommand() and EndReplicationCommand() should use PqMsg_CommandComplete instead. Oversight in commit f4b54e1ed9. Reported-by: Pavel Stehule, Tatsuo Ishii Author: Pavel Stehule Reviewed-by: Aleksander Alekseev, Michael Paquier --- src/backend/tcop/dest.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c index 06d1872b9a..bc8494ee7d 100644 --- a/src/backend/tcop/dest.c +++ b/src/backend/tcop/dest.c @@ -176,7 +176,7 @@ EndCommand(const QueryCompletion *qc, CommandDest dest, bool force_undecorated_o len = BuildQueryCompletionString(completionTag, qc, force_undecorated_output); - pq_putmessage(PqMsg_Close, completionTag, len + 1); + pq_putmessage(PqMsg_CommandComplete, completionTag, len + 1); case DestNone: case DestDebug: @@ -200,7 +200,7 @@ EndCommand(const QueryCompletion *qc, CommandDest dest, bool force_undecorated_o void EndReplicationCommand(const char *commandTag) { - pq_putmessage(PqMsg_Close, commandTag, strlen(commandTag) + 1); + pq_putmessage(PqMsg_CommandComplete, commandTag, strlen(commandTag) + 1); } /* -- 2.25.1
Re: tablecmds.c/MergeAttributes() cleanup
On Tue, Aug 29, 2023 at 08:44:02PM +0200, Alvaro Herrera wrote: > On 2023-Aug-29, Nathan Bossart wrote: >> My compiler is complaining about 1fa9241b: >> >> ../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’: >> ../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be >> used uninitialized in this function [-Werror=maybe-uninitialized] >> 196 | stmt->tableElts = lappend(stmt->tableElts, coldef); >> | ^~~~ >> >> This went away when I added a default case that ERROR'd or initialized >> coldef to NULL. > > Makes sense. However, maybe we should replace those ugly defines and > their hardcoded values in DefineSequence with a proper array with their > names and datatypes. That might be an improvement, but IIUC you'd still need to enumerate all of the columns or data types to make sure you use the right get-Datum function. It doesn't help that last_value uses Int64GetDatumFast and log_cnt uses Int64GetDatum. I could be missing something, though. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: tablecmds.c/MergeAttributes() cleanup
On 2023-Aug-29, Nathan Bossart wrote: > On Tue, Aug 29, 2023 at 10:43:39AM +0200, Peter Eisentraut wrote: > > I have committed a few more patches from this series that were already > > agreed upon. The remaining ones are rebased and reordered a bit, attached. > > My compiler is complaining about 1fa9241b: > > ../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’: > ../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be > used uninitialized in this function [-Werror=maybe-uninitialized] > 196 | stmt->tableElts = lappend(stmt->tableElts, coldef); > | ^~~~ > > This went away when I added a default case that ERROR'd or initialized > coldef to NULL. Makes sense. However, maybe we should replace those ugly defines and their hardcoded values in DefineSequence with a proper array with their names and datatypes. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Los trabajadores menos efectivos son sistematicamente llevados al lugar donde pueden hacer el menor daño posible: gerencia." (El principio Dilbert)
Re: tablecmds.c/MergeAttributes() cleanup
On Tue, Aug 29, 2023 at 10:43:39AM +0200, Peter Eisentraut wrote: > I have committed a few more patches from this series that were already > agreed upon. The remaining ones are rebased and reordered a bit, attached. My compiler is complaining about 1fa9241b: ../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’: ../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 196 | stmt->tableElts = lappend(stmt->tableElts, coldef); | ^~~~ This went away when I added a default case that ERROR'd or initialized coldef to NULL. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Strange presentaion related to inheritance in \d+
On 2023-Aug-29, Kyotaro Horiguchi wrote: > Attached is the initial version of the patch. It prevents "CREATE > TABLE" from executing if there is an inconsisntent not-null > constraint. Also I noticed that "ALTER TABLE t ADD NOT NULL c NO > INHERIT" silently ignores the "NO INHERIT" part and fixed it. Great, thank you. I pushed it after modifying it a bit -- instead of throwing the error in MergeAttributes, I did it in AddRelationNotNullConstraints(). It seems cleaner this way, mostly because we already have to match these two constraints there. (I guess you could argue that we waste catalog-insertion work before the error is reported and the whole thing is aborted; but I don't think this is a serious problem in practice.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
Re: broken master regress tests
út 29. 8. 2023 v 17:54 odesílatel Alvaro Herrera napsal: > On 2023-Aug-27, Thomas Munro wrote: > > > On Sun, Aug 27, 2023 at 3:03 AM Pavel Stehule > wrote: > > > So it looks so IPC::Run::run is ignore parent environment > > > > I guess the new initdb template captures lc_messages in > > postgresql.conf, when it runs earlier? I guess if you put > > $node->append_conf('postgresql.conf', 'lc_messages=C'); into > > src/bin/pg_amcheck/t/003_check.pl then it will work. I'm not sure > > what the correct fix should be, ie if the template mechanism should > > notice this difference and not use the template, or if tests that > > depend on the message locale should explicitly say so with > > lc_messages=C or similar (why is this the only one?), or ... > > So I tried this technique, but it gest old pretty fast: apparently > there's a *ton* of tests that depend on the locale. I gave up after > patching the first five files, and noticing that in a second run there > another half a dozen failing tests that hadn't failed the first time > around. (Not sure why this happened.) > > So I think injecting --no-locale to the initdb line that creates the > template is a better approach; something like the attached. > ok thank you for fixing it Regards Pavel > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ >
Re: pg_stat_get_backend_subxact() and backend IDs?
On Mon, Aug 28, 2023 at 10:53:52AM +0900, Michael Paquier wrote: > I understand that this is not a fantastic naming, but renaming > pgstat_fetch_stat_local_beentry() to something like > pgstat_fetch_stat_local_beentry_by_{index|position}_id() would make > the difference much easier to grasp, and we should avoid the use of > "beid" when we refer to the *position/index ID* in > localBackendStatusTable, because it is not a BackendId at all, just a > position in the local array. This was my first reaction [0]. I was concerned about renaming the exported functions so close to release, so I was suggesting that we hold off on that part until v17. If there isn't a concern with renaming these functions in v16, I can proceed with something more like v2. [0] https://postgr.es/m/20230824161913.GA1394441%40nathanxps13.lan -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Wrong usage of pqMsg_Close message code?
Hi everyone, Thanks for the report. I'll get this fixed up. My guess is that this was leftover from an earlier version of the patch that used the same macro for identical protocol characters. On Tue, Aug 29, 2023 at 10:01:47AM -0400, Tom Lane wrote: > Michael Paquier writes: >> Actually, this may be OK as well as it stands. One can also say that >> the parallel processing is out of this scope, being used only >> internally. I cannot keep wondering whether we should put more >> efforts in documenting the parallel worker/leader protocol. That's >> internal to the backend and out of the scope of this thread, still.. > > Yeah. I'm not sure whether the leader/worker protocol needs better > documentation, but the parts of it that are not common with the > frontend protocol should NOT be documented in protocol.sgml. > That would just confuse authors of frontend code. > > I don't mind having constants for the leader/worker protocol in > protocol.h, as long as they're in a separate section that's clearly > marked as relevant only for server-internal parallelism. +1, I left the parallel stuff (and a couple other things) out in the first round to avoid prolonging the naming discussion, but we can continue to add to protocol.h. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: broken master regress tests
On 2023-Aug-27, Thomas Munro wrote: > On Sun, Aug 27, 2023 at 3:03 AM Pavel Stehule wrote: > > So it looks so IPC::Run::run is ignore parent environment > > I guess the new initdb template captures lc_messages in > postgresql.conf, when it runs earlier? I guess if you put > $node->append_conf('postgresql.conf', 'lc_messages=C'); into > src/bin/pg_amcheck/t/003_check.pl then it will work. I'm not sure > what the correct fix should be, ie if the template mechanism should > notice this difference and not use the template, or if tests that > depend on the message locale should explicitly say so with > lc_messages=C or similar (why is this the only one?), or ... So I tried this technique, but it gest old pretty fast: apparently there's a *ton* of tests that depend on the locale. I gave up after patching the first five files, and noticing that in a second run there another half a dozen failing tests that hadn't failed the first time around. (Not sure why this happened.) So I think injecting --no-locale to the initdb line that creates the template is a better approach; something like the attached. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ diff --git a/meson.build b/meson.build index 8b2b521a01..5422885b0a 100644 --- a/meson.build +++ b/meson.build @@ -3107,7 +3107,7 @@ sys.exit(sp.returncode) ''', test_initdb_template, temp_install_bindir / 'initdb', - '-A', 'trust', '-N', '--no-instructions' + '-A', 'trust', '-N', '--no-instructions', '--no-locale' ], priority: setup_tests_priority - 1, timeout: 300, diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 0b4ca0eb6a..765b60db02 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -425,7 +425,7 @@ ifeq ($(MAKELEVEL),0) $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 - $(with_temp_install) initdb -A trust -N --no-instructions '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1 + $(with_temp_install) initdb -A trust -N --no-instructions --no-locale '$(abs_top_builddir)'/tmp_install/initdb-template >>'$(abs_top_builddir)'/tmp_install/log/initdb-template.log 2>&1 endif endif endif
Re: should frontend tools use syncfs() ?
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 1bb28cfe5d40670386ae663e14c8854dc1b5486d Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Aug 2023 19:05:14 -0700 Subject: [PATCH v6 1/2] move PG_TEMP_FILE* macros to file_utils.h --- src/backend/backup/basebackup.c | 2 +- src/backend/postmaster/postmaster.c | 1 + src/backend/storage/file/fileset.c | 1 + src/bin/pg_checksums/pg_checksums.c | 10 -- src/bin/pg_rewind/filemap.c | 2 +- src/include/common/file_utils.h | 4 src/include/storage/fd.h| 4 7 files changed, 8 insertions(+), 16 deletions(-) diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 45be21131c..5d66014499 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -25,6 +25,7 @@ #include "commands/defrem.h" #include "common/compression.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -37,7 +38,6 @@ #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/dsm_impl.h" -#include "storage/fd.h" #include "storage/ipc.h" #include "storage/reinit.h" #include "utils/builtins.h" diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..54e9bfb8c4 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -94,6 +94,7 @@ #include "access/xlogrecovery.h" #include "catalog/pg_control.h" #include "common/file_perm.h" +#include "common/file_utils.h" #include "common/ip.h" #include "common/pg_prng.h" #include "common/string.h" diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c index e9951b0269..84cae32548 100644 --- a/src/backend/storage/file/fileset.c +++ b/src/backend/storage/file/fileset.c @@ -25,6 +25,7 @@ #include "catalog/pg_tablespace.h" #include "commands/tablespace.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "miscadmin.h" #include "storage/ipc.h" diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 19eb67e485..9011a19b4e 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -52,16 +52,6 @@ typedef enum PG_MODE_ENABLE } PgChecksumMode; -/* - * Filename components. - * - * XXX: fd.h is not declared here as frontend side code is not able to - * interact with the backend-side definitions for the various fsync - * wrappers. - */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c index bd5c598e20..58280d9abc 100644 --- a/src/bin/pg_rewind/filemap.c +++ b/src/bin/pg_rewind/filemap.c @@ -27,12 +27,12 @@ #include #include "catalog/pg_tablespace_d.h" +#include "common/file_utils.h" #include "common/hashfn.h" #include "common/string.h" #include "datapagemap.h" #include "filemap.h" #include "pg_rewind.h" -#include "storage/fd.h" /* * Define a hash table which we can use to store information about the files diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index b7efa1226d..dd1532bcb0 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd, extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + #endif /* FILE_UTILS_H */ diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 6791a406fc..aea30c0622 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -195,8 +195,4 @@ extern int durable_unlink(const char *fname, int elevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -/* Filename components */ -#define PG_TEMP_FILES_DIR "pgsql_tmp" -#define PG_TEMP_FILE_PREFIX "pgsql_tmp" - #endif /* FD_H */ -- 2.25.1 >From 81e87db711bb90f4641b41b4f863f1f5064eb05d Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 28 Jul 2023 15:56:24 -0700 Subject: [PATCH v6 2/2] allow syncfs in frontend utilities --- doc/src/sgml/config.sgml | 12 +--- doc/src/sgml/filelist.sgml| 1 + doc/src/sgml/postgres.sgml| 1 + doc/src/sgml/ref/initdb.sgml | 22 +++ doc/src/sgml/ref/pg_basebackup.sgml | 25 doc/src/sgml/ref/pg_checksums.sgml| 22 +++ doc/src/sgml/ref/pg_dump.sgml | 21 +++ doc/src/sgml/ref/pg_rewind.sgml | 22 +++ doc/src/sgml/ref/pgupgrade.sgml | 23 +++ doc/src/sgml/syncfs.sgml | 36 +++ src/backend/storage/file/fd.c | 4 +- src/backend/utils/misc/guc_tables.c
Re: Eager page freeze criteria clarification
On Mon, Aug 28, 2023 at 4:30 PM Melanie Plageman wrote: > By low-velocity, do you mean lower overall TPS? In that case, wouldn't you be > less likely to run into xid wraparound and thus need less aggressive > opportunistic freezing? Yes. But it also means that we've got slack capacity to do extra work now without really hurting anything. If we can leverage that capacity to reduce the pain of future bulk operations, that seems good. When resources are tight, doing speculative work immediately becomes less appealing. It's pretty hard to take such things into account, though. I was just mentioning it. > So, this is where the caveat about absolute number of page freezes > matters. In algorithm A, master only did 57 page freezes (spread across > the various pgbench tables). At the end of the run, 2 pages were still > frozen. I'm increasingly feeling that it's hard to make sense of the ratio. Maybe show the number of pages frozen, the number that are frozen at the end, and the number of pages in the database at the end of the run as three separate values. > (1) seems bad to me because it doesn't consider whether or not freezing > will be useful -- only if it will be cheap. It froze very little of the > cold data in a workload where a small percentage of it was being > modified (especially workloads A, C, H). And it froze a lot of data in > workloads where it was being uniformly modified (workload B). Sure, but if the cost of being wrong is low, you can be wrong a lot and still be pretty happy. It's unclear to me to what extent we should gamble on making only inexpensive mistakes and to what extent we should gamble on making only infrequent mistakes, but they're both valid strategies. > I suggested (4) and (5) because I think the "older than 33%" threshold > is better than the "older than 10%" threshold. I chose both because I am > still unclear on our values. Are we willing to freeze more aggressively > at the expense of emitting more FPIs? As long as it doesn't affect > throughput? For pretty much all of these workloads, the algorithms which > froze based on page modification recency OR FPI required emitted many > more FPIs than those which froze based only on page modification > recency. Let's assume for a moment that the rate at which the insert LSN is advancing is roughly constant over time, so that it serves as a good proxy for wall-clock time. Consider four tables A, B, C, and D that are, respectively, vacuumed once per minute, once per hour, once per day, and once per week. With a 33% threshold, pages in table A will be frozen if they haven't been modified in 20 seconds, page in table B will be frozen if they haven't been modified in 20 minutes, pages in table C will be frozen if they haven't been modified in 8 hours, and pages in table D will be frozen if they haven't been modified in 2 days, 8 hours. My intuition is that this feels awfully aggressive for A and awfully passive for D. To expand on that: apparently D doesn't actually get much write activity, else there would be more vacuuming happening. So it's very likely that pages in table D are going to get checkpointed and evicted before they get modified again. Freezing them therefore seems like a good bet: it's a lot cheaper to freeze those pages when they're already in shared_buffers and dirty than it is if we have to read and write them specifically for freezing. It's less obvious that what we're doing in table A is wrong, and it could be exactly right, but workloads where a row is modified, a human thinks about something (e.g. whether to complete the proposed purchase), and then the same row is modified again are not uncommon, and human thinking times can easily exceed 20 seconds. On the other hand, workloads where a row is modified, a computer thinks about something, and then the row is modified again are also quite common, and computer thinking times can easily be less than 20 seconds. It feels like a toss-up whether we get it right. For this kind of table, I suspect we'd be happier freezing pages that are about to be evicted or about to be written as part of a checkpoint rather than freezing pages opportunistically in vacuum. Maybe that's something we need to think harder about. If we froze dirty pages that wouldn't need a new FPI just before evicting them, and just before they would be written out for a checkpoint, under what circumstances would we still want vacuum to opportunistically freeze? I think the answer might be "only when it's very cheap." If it's very cheap to freeze now, it's appealing to gamble on doing it before a checkpoint comes along and makes the same operation require an extra FPI. But if it isn't too cheap, then why not just wait and see what happens? If the buffer gets modified again before it gets written out, then freeing immediately is a waste and freeze-on-evict is better. If it doesn't, freezing immediately and freeze-on-evict are the same price. I'm hand-waving a bit here because maybe freeze-on-evict
Re: Debian 12 gcc warning
Bruce Momjian writes: > On Tue, Aug 29, 2023 at 10:26:27AM +0700, John Naylor wrote: >> It looks like the former, since I can silence it on gcc 13 / -O1 by doing: >> /* keep compiler quiet */ >> actual_arg_types[0] = InvalidOid; > Agreed, that fixes it for me too. In fact, assigning to only element 99 or > 200 also prevents the warning, and considering the array is defined for > 100 elements, the fact is accepts 200 isn't a good thing. Patch attached. That seems like a pretty clear compiler bug, particularly since it just appears in this one version. Rather than contorting our code, I'd suggest filing a gcc bug. regards, tom lane
Re: Wrong usage of pqMsg_Close message code?
Michael Paquier writes: > Actually, this may be OK as well as it stands. One can also say that > the parallel processing is out of this scope, being used only > internally. I cannot keep wondering whether we should put more > efforts in documenting the parallel worker/leader protocol. That's > internal to the backend and out of the scope of this thread, still.. Yeah. I'm not sure whether the leader/worker protocol needs better documentation, but the parts of it that are not common with the frontend protocol should NOT be documented in protocol.sgml. That would just confuse authors of frontend code. I don't mind having constants for the leader/worker protocol in protocol.h, as long as they're in a separate section that's clearly marked as relevant only for server-internal parallelism. regards, tom lane
Re: Allow specifying a dbname in pg_basebackup connection string
On Mon, 28 Aug 2023 at 23:50, Tristen Raab wrote: > I've reviewed your patch and it applies and builds without error. When > testing this patch I was slightly confused as to what its purpose was, after > testing it I now understand. Initially, I thought this was a change to add > database-level replication. I would suggest some clarifications to the > documentation such as changing: Thanks for the review. I've updated the documentation to make it clearer (using some of your suggestions but also some others) v3-0001-Allow-specifying-a-dbname-in-pg_basebackup-connec.patch Description: Binary data
Re: Query execution in Perl TAP tests needs work
On Tue, Aug 29, 2023 at 04:25:24PM +1200, Thomas Munro wrote: > On Tue, Aug 29, 2023 at 1:48 PM Noah Misch wrote: > > https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929 > > Interesting. But that shows a case with no pipes connected, using > select() as a dumb sleep and ignoring SIGCHLD. In our usage we have > pipes connected, and I think select() should return when the child's > output pipes become readable due to EOF. I guess something about that > might be b0rked on Windows? I see there is an extra helper process > doing socket<->pipe conversion (hah, that explains an extra ~10ms at > the start in my timestamps)... In that case, let's assume it's not the same issue.
Re: A failure in 031_recovery_conflict.pl on Debian/s390x
Re: Thomas Munro > 2022), and then backpatched to all releases. They were disabled again > in release branches 10-14 (discussion at > https://postgr.es/m/3447060.1652032...@sss.pgh.pa.us): > > +plan skip_all => "disabled until after minor releases, due to instability"; Right: https://pgdgbuild.dus.dg-i.net/view/Snapshot/job/postgresql-14-binaries-snapshot/366/architecture=s390x,distribution=sid/consoleText -> t/031_recovery_conflict.pl ... skipped: disabled until after minor releases, due to instability > Now your mainframe build bot is regularly failing for whatever timing > reason, in 16 and master. That's quite useful because your tests have > made us or at least me a lot more confident that the fix is good (one > of the reasons progress has been slow is that failures in CI and BF > were pretty rare and hard to analyse). But... I wonder why it isn't > failing for you in 15? Are you able to check if it ever has? I > suppose we could go and do the "disabled due to instability" thing in > 15 and 16, and then later we'll un-disable it in 16 when we back-patch > the fix into it. The current explanation is that builds are only running frequently for 16 and devel, and the periodic extra test jobs only run "make check" and the postgresql-common testsuite, not the full server testsuite. I had "snapshot" jobs configured for basically everything (server and extensions) for some time, but ultimately realized that I don't have the slightest time to look at everything, so I disabled them again about 3 weeks ago. This removed enough noise so I could actually look at the failing 16 and devel jobs again. I don't see any failures of that kind for the 15 snapshot build while it was still running, the failures in there are all something else (mostly problems outside PG). https://pgdgbuild.dus.dg-i.net/view/Snapshot/job/postgresql-15-binaries-snapshot/ I'll enable the server snapshot builds again, those shouldn't be too noisy. Christoph
Re: logical_replication_mode
On 27.08.23 14:05, Zhijie Hou (Fujitsu) wrote: On Friday, August 25, 2023 5:56 PM Amit Kapila wrote: On Fri, Aug 25, 2023 at 12:38 PM Peter Eisentraut wrote: On 25.08.23 08:52, Zhijie Hou (Fujitsu) wrote: On Friday, August 25, 2023 12:28 PM Amit Kapila wrote: On Thu, Aug 24, 2023 at 12:45 PM Peter Eisentraut wrote: I suggest we rename this setting to something starting with debug_. Right now, the name looks much too tempting for users to fiddle with. I think this is similar to force_parallel_mode. +1. How about debug_logical_replication? Also, the descriptions in guc_tables.c could be improved. For example, gettext_noop("Controls when to replicate or apply each change."), is pretty content-free and unhelpful. The other possibility I could think of is to change short_desc as: "Allows to replicate each change for large transactions.". Do you have any better ideas? How about "Forces immediate streaming or serialization of changes in large transactions." which is similar to the description in document. I agree that renaming it to debug_xx would be better and here is a patch that tries to do this. Maybe debug_logical_replication is too general? Something like debug_logical_replication_streaming would be more concrete. Yeah, that sounds better. OK, here is the debug_logical_replication_streaming version. committed
Re: Debian 12 gcc warning
On Tue, Aug 29, 2023 at 10:26:27AM +0700, John Naylor wrote: > > On Tue, Aug 29, 2023 at 6:56 AM David Rowley wrote: > > > > I'm just not sure if it's unable to figure out if at least nargs > > elements is set or if it won't be happy until all 100 elements are > > set. > > It looks like the former, since I can silence it on gcc 13 / -O1 by doing: > > /* keep compiler quiet */ > actual_arg_types[0] = InvalidOid; Agreed, that fixes it for me too. In fact, assigning to only element 99 or 200 also prevents the warning, and considering the array is defined for 100 elements, the fact is accepts 200 isn't a good thing. Patch attached. I think the question is whether we add this to silence a common compiler but non-default optimization level. It is the only such case in our source code right now. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index da258968b8..f4a1d1049c 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -4284,6 +4284,10 @@ recheck_cast_function_args(List *args, Oid result_type, if (list_length(args) > FUNC_MAX_ARGS) elog(ERROR, "too many function arguments"); nargs = 0; + + /* Silence gcc 12 compiler at -O1. */ + actual_arg_types[0] = InvalidOid; + foreach(lc, args) { actual_arg_types[nargs++] = exprType((Node *) lfirst(lc));
Re: Eliminate redundant tuple visibility check in vacuum
Hi David, Thanks for taking a look! On Tue, Aug 29, 2023 at 5:07 AM David Geier wrote: > > Hi Melanie, > > On 8/29/23 01:49, Melanie Plageman wrote: > > While working on a set of patches to combine the freeze and visibility > > map WAL records into the prune record, I wrote the attached patches > > reusing the tuple visibility information collected in heap_page_prune() > > back in lazy_scan_prune(). > > > > heap_page_prune() collects the HTSV_Result for every tuple on a page > > and saves it in an array used by heap_prune_chain(). If we make that > > array available to lazy_scan_prune(), it can use it when collecting > > stats for vacuum and determining whether or not to freeze tuples. > > This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in > > the page. > > > > It also gets rid of the retry loop in lazy_scan_prune(). > > How did you test this change? I didn't add a new test, but you can confirm some existing test coverage if you, for example, set every HTSV_Result in the array to HEAPTUPLE_LIVE and run the regression tests. Tests relying on vacuum removing the right tuples may fail. For example, select count(*) from gin_test_tbl where i @> array[1, 999]; in src/test/regress/sql/gin.sql fails for me since it sees a tuple it shouldn't. > Could you measure any performance difference? > > If so could you provide your test case? I created a large table and then updated a tuple on every page in the relation and vacuumed it. I saw a consistent slight improvement in vacuum execution time. I profiled a bit with perf stat as well. The difference is relatively small for this kind of example, but I can work on a more compelling, realistic example. I think eliminating the retry loop is also useful, as I have heard that users have had to cancel vacuums which were in this retry loop for too long. - Melanie
Re: pg_rewind WAL segments deletion pitfall
On 2023-08-24 09:45, Kyotaro Horiguchi wrote: At Wed, 23 Aug 2023 13:44:52 +0200, Alexander Kukushkin wrote in On Tue, 22 Aug 2023 at 07:32, Michael Paquier wrote: > I don't like much this patch. While it takes correctly advantage of > the backward record read logic from SimpleXLogPageRead() able to > handle correctly timeline jumps, it creates a hidden dependency in the > code between the hash table from filemap.c and the page callback. > Wouldn't it be simpler to build a list of the segment names using the > information from WALOpenSegment and build this list in > findLastCheckpoint()? I think the first version of the patch more or less did that. Not necessarily a list, but a hash table of WAL file names that we want to keep. But Kyotaro Horiguchi didn't like it and suggested creating entries in the filemap.c hash table instead. But, I agree, doing it directly from the findLastCheckpoint() makes the code easier to understand. ... > + /* > +* Some entries (WAL segments) already have an action assigned > +* (see SimpleXLogPageRead()). > +*/ > + if (entry->action == FILE_ACTION_UNDECIDED) > + entry->action = decide_file_action(entry); > > This change makes me a bit uneasy, per se my previous comment with the > additional code dependencies. > We can revert to the original approach (see v1-0001-pg_rewind-wal-deletion.patch from the very first email) if you like. On the other hand, that approach brings in another source that suggests the way that file should be handled. I still think that entry->action should be the only source. +1. Imaging a case when we come to need decide how to treat files based on yet another factor, I feel that a single source of truth is better than creating a list or hash for each factor. However, it seems I'm in the minority here. So I'm not tied to that approach. > I think that this scenario deserves a test case. If one wants to > emulate a delay in WAL archiving, it is possible to set > archive_command to a command that we know will fail, for instance. > Yes, I totally agree, it is on our radar, but meanwhile please see the new version, just to check if I correctly understood your idea. Thanks for the patch. I tested v4 patch using the script attached below thread and it has successfully finished. https://www.postgresql.org/message-id/2e75ae22dce9a227c3d47fa6d0ed094a%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Standardize spelling of "power of two"
> On 29 Aug 2023, at 13:11, Alvaro Herrera wrote: > > On 2023-Aug-29, Daniel Gustafsson wrote: > >> Agreed. While we have numerous "power of 2" these were the only ones >> in translated user-facing messages, so I've pushed this to master (it >> didn't seem worth disrupting translations for 16 as we are so close to >> wrapping it, if others disagree I can backpatch). > > I'd rather backpatch it. There's only 5 translations that are 100% for > initdb.po, and they have two weeks to make the change from "2" to "two". > I don't think this is a problem. Fair enough, backpatched to v16. -- Daniel Gustafsson
Re: subscription/015_stream sometimes breaks
On 2023-Aug-29, Amit Kapila wrote: > On Mon, Aug 28, 2023 at 5:35 AM Peter Smith wrote: > > > > On Fri, Aug 25, 2023 at 8:15 PM Amit Kapila wrote: > > > > IMO there are inconsistencies in the second patch that was pushed. > I find your suggestions reasonable. Alvaro, do you have any comments? Well, my main comment is that at this point I'm not sure these isFooWorker() macros are worth their salt. It looks like we could replace their uses with direct type comparisons in their callsites and remove them, with no loss of readability. The am_sth_worker() are probably a bit more useful, though some of the callsites could end up better if replaced with straight type comparison. All in all, I don't disagree with Peter's suggestions, but this is pretty much in "meh" territory for me. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Autogenerate some wait events code and documentation
On 2023-Aug-29, Michael Paquier wrote: > On Tue, Aug 29, 2023 at 08:17:10AM +0200, Drouvot, Bertrand wrote: > > Agree that done that way one could easily grep the events from the > > source code and match them with wait_event_names.txt. Then I don't > > think the "search" issue in the code is still a concern with the > > current proposal. > > It could still be able to append WAIT_EVENT_ to the first column of > the file. I'd just rather keep it shorter. Yeah, I have a mild preference for keeping the prefix, but it's mild because I also imagine that if somebody doesn't see the full symbol name when grepping they will think to remove the prefix. So only -0.1. I think the DOCONLY stuff should be better documented; they make no sense without looking at the commit message for fa88928470b5. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Use the same Windows image on both VS and MinGW tasks
Hi, The VS and MinGW Windows images are merged on Andres' pg-vm-images repository now [1]. So, the old pg-ci-windows-ci-vs-2019 and pg-ci-windows-ci-mingw64 images will not be updated from now on. This new merged image (pg-ci-windows-ci) needs to be used on both VS and MinGW tasks. I attached a patch for using pg-ci-windows-ci Windows image on VS and MinGW tasks. CI run when pg-ci-windows-ci is used: https://cirrus-ci.com/build/6063036847357952 [1]: https://github.com/anarazel/pg-vm-images/commit/6747f676b97348d47f041b05aa9b36cde43c33fe Regards, Nazir Bilal Yavuz Microsoft From 8db9f8672a9216f02e8e599626121c0de5befd59 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 28 Aug 2023 13:23:41 +0300 Subject: [PATCH v1] windows: Use the same Windows image on both VS and MinGW tasks The VS and MinGW Windows images are merged now [1]. So, the old pg-ci-windows-ci-vs-2019 and pg-ci-windows-ci-mingw64 images will not be updated from now on. This new merged image (pg-ci-windows-ci) needs to be used on both VS and MinGW tasks. [1]: https://github.com/anarazel/pg-vm-images/commit/6747f676b97348d47f041b05aa9b36cde43c33fe --- .cirrus.tasks.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..66aee48a5d3 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -514,6 +514,7 @@ WINDOWS_ENVIRONMENT_BASE: _ENVIRONMENT_BASE PG_TEST_USE_UNIX_SOCKETS: 1 PG_REGRESS_SOCK_DIR: "c:/cirrus/" DISK_SIZE: 50 +IMAGE_FAMILY: pg-ci-windows-ci sysinfo_script: | chcp @@ -537,7 +538,6 @@ task: # given that it explicitly prevents crash dumps from working... # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX CIRRUS_WINDOWS_ERROR_MODE: 0x8001 -IMAGE_FAMILY: pg-ci-windows-ci-vs-2019 <<: *windows_task_template @@ -598,7 +598,6 @@ task: # Start bash in current working directory CHERE_INVOKING: 1 BASH: C:\msys64\usr\bin\bash.exe -l -IMAGE_FAMILY: pg-ci-windows-ci-mingw64 <<: *windows_task_template -- 2.40.1
Re: proposal: psql: show current user in prompt
On Mon, 28 Aug 2023 at 15:00, Pavel Stehule wrote: + minServerMajor = 1600; + serverMajor = PQserverVersion(pset.db) / 100; Instead of using the server version, we should instead use the protocol version negotiation that's provided by the NegotiateProtocolVersion message type. We should bump the requested protocol version from 3.0 to 3.1 and check that the server supports 3.1. Otherwise proxies or connection poolers might get this new message type, without knowing what to do with them. + +ReportGUC (F) We'll need some docs on the "Message Flow" page too: https://www.postgresql.org/docs/current/protocol-flow.html Specifically what response is expected, if any. Another thing that should be described there is that this falls outside of the transaction flow, i.e. it's changes are not reverted on ROLLBACK. But that leaves an important consideration: What happens when an error occurs on the server during handling of this message (e.g. the GUC does not exist or an OOM is triggered). Is any open transaction aborted in that case? If not, we should have a test for that. + if (PQresultStatus(res) != PGRES_COMMAND_OK) + pg_fatal("failed to link custom variable: %s", PQerrorMessage(conn)); + PQclear(res); The tests should also change the config after running PQlinkParameterStatus/PQunlinkParameterStatus to show that the guc is reported then or not reported then. + if (!PQsendTypedCommand(conn, PqMsg_ReportGUC, 't', paramName)) + return NULL; I think we'll need some bikeshedding on what the protocol message should look like exactly. I'm not entirely sure what is the most sensible here, so please treat everything I write next as suggestions/discussion: I see that you're piggy-backing on PQsendTypedCommand, which seems nice to avoid code duplication. It has one downside though: not every type, is valid for each command anymore. One way to avoid that would be to not introduce a new command, but only add a new type that is understood by Describe and Close, e.g. a 'G' (for GUC). Then PqMsg_Describe, G would become the equivalent of what'the current patch its PqMsg_ReportGUC, 't' and PqMsg_Close, G would be the same as PqMsg_ReportGUC, 'f'. The rest of this email assumes that we're keeping your current proposal for the protocol message, so it might not make sense to address all of this feedback, in case we're still going to change the protocol: + if (is_set == 't') + { + SetGUCOptionFlag(name, GUC_REPORT); + status = "SET REPORT_GUC"; + } + else + { + UnsetGUCOptionFlag(name, GUC_REPORT); + status = "UNSET REPORT_GUC"; + } I think we should be strict about what we accept here. Any other value than 'f'/'t' for is_set should result in an error imho.
Re: persist logical slots to disk during shutdown checkpoint
On Tue, Aug 29, 2023 at 2:21 PM Amit Kapila wrote: > > On Tue, Aug 29, 2023 at 10:16 AM vignesh C wrote: > > > > That makes sense. The attached v6 version has the changes for the > > same, apart from this I have also fixed a) pgindent issues b) perltidy > > issues c) one variable change (flush_lsn_changed to > > confirmed_flush_has_changed) d) corrected few comments in the test > > file. Thanks to Peter for providing few offline comments. > > > > The latest version looks good to me. Julien, Ashutosh, and others, > unless you have more comments or suggestions, I would like to push > this in a day or two. I am looking at it. If you can wait till the end of the week, that will be great. -- Best Wishes, Ashutosh Bapat
Re: tablecmds.c/MergeAttributes() cleanup
On 2023-Aug-29, Peter Eisentraut wrote: > From 471fda80c41fae835ecbe63ae8505526a37487a9 Mon Sep 17 00:00:00 2001 > From: Peter Eisentraut > Date: Wed, 12 Jul 2023 16:12:35 +0200 > Subject: [PATCH v2 04/10] Add TupleDescGetDefault() > > This unifies some repetitive code. > > Note: I didn't push the "not found" error message into the new > function, even though all existing callers would be able to make use > of it. Using the existing error handling as-is would probably require > exposing the Relation type via tupdesc.h, which doesn't seem > desirable. Note that all errors here are elog(ERROR), not user-facing, so ISTM it would be OK to change them to report the relation OID rather than the name. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Amit, Thank you for giving comments! PSA new version. I ran the pgindent. > > 1. check_and_dump_old_cluster > > > > CURRENT CODE (with v26-0003 patch applied) > > > > /* Extract a list of logical replication slots */ > > get_old_cluster_logical_slot_infos(); > > > > ... > > > > /* > > * Logical replication slots can be migrated since PG17. See comments atop > > * get_old_cluster_logical_slot_infos(). > > */ > > if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) > > { > > check_old_cluster_for_lost_slots(); > > > > /* > > * Do additional checks if a live check is not required. This requires > > * that confirmed_flush_lsn of all the slots is the same as the latest > > * checkpoint location, but it would be satisfied only when the server > > * has been shut down. > > */ > > if (!live_check) > > check_old_cluster_for_confirmed_flush_lsn(); > > } > > > > > > SUGGESTION > > > > /* > > * Logical replication slots can be migrated since PG17. See comments atop > > * get_old_cluster_logical_slot_infos(). > > */ > > if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a. > > { > > /* Extract a list of logical replication slots */ > > get_old_cluster_logical_slot_infos(); > > > > if (count_old_cluster_slots()) // NOTE 1b. > > { > > check_old_cluster_for_lost_slots(); > > > > /* > > * Do additional checks if a live check is not required. This requires > > * that confirmed_flush_lsn of all the slots is the same as the latest > > * checkpoint location, but it would be satisfied only when the server > > * has been shut down. > > */ > > if (!live_check) > > check_old_cluster_for_confirmed_flush_lsn(); > > } > > } > > > > I think a slightly better way to achieve this is to combine the code > from check_old_cluster_for_lost_slots() and > check_old_cluster_for_confirmed_flush_lsn() into > check_old_cluster_for_valid_slots(). That will even save us a new > connection for the second check. They are combined into one function. > Also, I think we can simplify another check in the patch: > @@ -1446,8 +1446,10 @@ check_new_cluster_logical_replication_slots(void) > char *wal_level; > > /* Logical slots can be migrated since PG17. */ > - if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) > - nslots = count_old_cluster_logical_slots(); > + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) > + return; > + > + nslots = count_old_cluster_logical_slots(); > Fixed. Also, I have tested the combination of this patch and the physical standby. 1. Logical slots defined on old physical standby *cannot be upgraded* 2. Logical slots defined on physical primary *are migrated* to new physical standby The primal reason is that pg_upgrade cannot be used for physical standby. If users want to upgrade standby, rsync command is used instead. The command creates the cluster based on the based on the new primary, hence they are replicated to new standby. In contrast, the old cluster is basically ignored so that slots on old cluster is not upgraded. I updated the doc accordingly. Best Regards, Hayato Kuroda FUJITSU LIMITED v28-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch Description: v28-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch v28-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch Description: v28-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch v28-0003-pg_upgrade-Add-check-function-for-logical-replic.patch Description: v28-0003-pg_upgrade-Add-check-function-for-logical-replic.patch
Re: tablecmds.c/MergeAttributes() cleanup
On 2023-Aug-29, Peter Eisentraut wrote: Regarding this hunk in 0002, > @@ -3278,13 +3261,16 @@ MergeAttributes(List *schema, List *supers, char > relpersistence, > * > * constraints is a list of CookedConstraint structs for previous > constraints. > * > - * Returns true if merged (constraint is a duplicate), or false if it's > - * got a so-far-unique name, or throws error if conflict. > + * If the constraint is a duplicate, then the existing constraint's > + * inheritance count is updated. If the constraint doesn't match or conflict > + * with an existing one, a new constraint is appended to the list. If there > + * is a conflict (same name but different expression), throw an error. This wording confused me: "If the constraint doesn't match or conflict with an existing one, a new constraint is appended to the list." I first read it as "doesn't match or conflicts with ..." (i.e., the negation only applied to the first verb, not both) which would have been surprising (== broken) behavior. I think it's clearer if you say "doesn't match nor conflict", but I'm not sure if this is grammatically correct. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Support prepared statement invalidation when result types change
On Tue, 29 Aug 2023 at 11:29, jian he wrote: > regression=# CREATE TEMP TABLE pcachetest AS SELECT * FROM int8_tbl; > SELECT 5 > regression=# PREPARE prepstmt2(bigint) AS SELECT * FROM pcachetest > WHERE q1 = $1;' > PREPARE > regression=# alter table pcachetest rename q1 to x; > ALTER TABLE > regression=# EXECUTE prepstmt2(123); > 2023-08-29 17:23:59.148 CST [1382074] ERROR: column "q1" does not > exist at character 61 > 2023-08-29 17:23:59.148 CST [1382074] HINT: Perhaps you meant to > reference the column "pcachetest.q2". > 2023-08-29 17:23:59.148 CST [1382074] STATEMENT: EXECUTE prepstmt2(123); > ERROR: column "q1" does not exist > HINT: Perhaps you meant to reference the column "pcachetest.q2". Thank you for the full set of commands. In that case the issue you're describing is completely separate from this patch. The STATEMENT: part of the log will always show the query that was received by the server. This behaviour was already present even without my patch (I double checked with PG15.3).
Re: Standardize spelling of "power of two"
On 2023-Aug-29, Daniel Gustafsson wrote: > Agreed. While we have numerous "power of 2" these were the only ones > in translated user-facing messages, so I've pushed this to master (it > didn't seem worth disrupting translations for 16 as we are so close to > wrapping it, if others disagree I can backpatch). I'd rather backpatch it. There's only 5 translations that are 100% for initdb.po, and they have two weeks to make the change from "2" to "two". I don't think this is a problem. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Support run-time partition pruning for hash join
On Fri, Aug 25, 2023 at 11:03 AM David Rowley wrote: > I'd suggest writing some cost which costs an execution of run-time > pruning. With LIST and RANGE you probably want something like > cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account > for the binary search over the sorted datum array. For HASH > partitions, something like cpu_operator_cost * npartcols once for each > hashed tuple. > > You'll need to then come up with some counter costs to subtract from > the Append/MergeAppend. This is tricky, as discussed. Just come up > with something crude for now. > > To start with, it could just be as crude as: > > total_costs *= (Min(expected_outer_rows, n_append_subnodes) / > n_append_subnodes); > > i.e assume that every outer joined row will require exactly 1 new > partition up to the total number of partitions. That's pretty much > worst-case, but it'll at least allow the optimisation to work for > cases like where the hash table is expected to contain just a tiny > number of rows (fewer than the number of partitions) > > To make it better, you might want to look at join selectivity > estimation and see if you can find something there to influence > something better. I have a go at writing some costing codes according to your suggestion. That's compute_partprune_cost() in the v2 patch. For the hash side, this function computes the pruning cost as cpu_operator_cost * LOG2(nparts) * inner_rows for LIST and RANGE, and cpu_operator_cost * nparts * inner_rows for HASH. For the Append/MergeAppend side, this function first estimates the size of outer side that matches, using the same idea as we estimate the joinrel size for JOIN_SEMI. Then it assumes that each outer joined row occupies one new partition (the worst case) and computes how much cost can be saved from partition pruning. If the cost saved from the Append/MergeAppend side is larger than the pruning cost from the Hash side, then we say that partition pruning is a win. Note that this costing logic runs for each Append-Hash pair, so it copes with the case where we have multiple join levels. With this costing logic added, I performed the same performance comparisons of the worst case as in [1], and here is what I got. tuples unpatched patched 1 44.66 44.37 -0.006493506 2 52.41 52.29 -0.002289639 3 61.11 61.12 +0.000163639 4 67.87 68.24 +0.005451599 5 74.51 74.75 +0.003221044 6 82.381.55 -0.009113001 7 87.16 86.98 -0.002065168 8 93.49 93.89 +0.004278532 9 101.52 100.83 -0.00679669 10 108.34 108.56 +0.002030644 So the costing logic successfully avoids performing the partition pruning in the worst case. I also tested the cases where partition pruning is possible with different sizes of the hash side. tuples unpatched patched 100 36.86 2.4 -0.934888768 200 35.87 2.37-0.933928074 300 35.95 2.55-0.92906815 400 36.42.63-0.927747253 500 36.39 2.85-0.921681781 600 36.32 2.97-0.918226872 700 36.63.23-0.911748634 800 36.88 3.44-0.906724512 900 37.02 3.46-0.906537007 100037.25 37.21 -0.001073826 The first 9 rows show that the costing logic allows the partition pruning to be performed and the pruning turns out to be a big win. The last row shows that the partition pruning is disallowed by the costing logic because it thinks no partition can be pruned (we have 1000 partitions in total). So it seems that the new costing logic is quite crude and tends to be very conservative, but it can help avoid the large overhead in the worst cases. I think this might be a good start to push this patch forward. Any thoughts or comments? [1] https://www.postgresql.org/message-id/CAMbWs49%2Bp6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q%40mail.gmail.com Thanks Richard v2-0001-Support-run-time-partition-pruning-for-hash-join.patch Description: Binary data
Re: Sync scan & regression tests
(noticed this thread just now) On 07/08/2023 03:55, Tom Lane wrote: Having said that ... I just noticed that chipmunk, which I'd been ignoring because it had been having configuration-related failures ever since it came back to life about three months ago, has gotten past those problems Yes, I finally got around to fix the configuration... and is now failing with what looks suspiciously like syncscan problems: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2023-08-06%2008%3A20%3A21 This is possibly explained by the fact that it uses (per its extra_config) 'shared_buffers = 10MB', although it's done that for a long time and portals.out hasn't changed since before chipmunk's latest success. Perhaps some change in an earlier test script affected this? I changed the config yesterday to 'shared_buffers = 20MB', before seeing this thread. If we expect the regression tests to work with 'shared_buffers=10MB' - and I think that would be nice - I'll change it back. But let's see what happens with 'shared_buffers=20MB'. It will take a few days for the tests to complete. I think it'd be worth running to ground exactly what is causing these failures and when they started. I bisected it to this commit: commit 7ae0ab0ad9704b10400a611a9af56c63304c27a5 Author: David Rowley Date: Fri Feb 3 16:20:43 2023 +1300 Reduce code duplication between heapgettup and heapgettup_pagemode The code to get the next block number was exactly the same between these two functions, so let's just put it into a helper function and call that from both locations. Author: Melanie Plageman Reviewed-by: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_t8qeqzqol02rpi...@mail.gmail.com From that description, that was supposed to be just code refactoring, with no change in behavior. Looking the new heapgettup_advance_block() function and the code that it replaced, it's now skipping this ss_report_location() on the last call, when it has reached the end of the scan: /* * Report our new scan position for synchronization purposes. We * don't do that when moving backwards, however. That would just * mess up any other forward-moving scanners. * * Note: we do this before checking for end of scan so that the * final state of the position hint is back at the start of the * rel. That's not strictly necessary, but otherwise when you run * the same query multiple times the starting position would shift * a little bit backwards on every invocation, which is confusing. * We don't guarantee any specific ordering in general, though. */ if (scan->rs_base.rs_flags & SO_ALLOW_SYNC) ss_report_location(scan->rs_base.rs_rd, block); The comment explicitly says that we do that before checking for end-of-scan, but that commit moved it to after. I propose the attached to move it back to before the end-of-scan checks. Melanie, David, any thoughts? But assuming that they are triggered by syncscan, my first thought about dealing with it is to disable syncscan within the affected tests. However ... do we exercise the syncscan logic at all within the regression tests, normally? Is there a coverage issue to be dealt with? Good question.. -- Heikki Linnakangas Neon (https://neon.tech) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 3bb1d5cff68..48b4ca45439 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -647,17 +647,6 @@ heapgettup_advance_block(HeapScanDesc scan, BlockNumber block, ScanDirection dir if (block >= scan->rs_nblocks) block = 0; - /* we're done if we're back at where we started */ - if (block == scan->rs_startblock) -return InvalidBlockNumber; - - /* check if the limit imposed by heap_setscanlimits() is met */ - if (scan->rs_numblocks != InvalidBlockNumber) - { -if (--scan->rs_numblocks == 0) - return InvalidBlockNumber; - } - /* * Report our new scan position for synchronization purposes. We * don't do that when moving backwards, however. That would just @@ -673,6 +662,17 @@ heapgettup_advance_block(HeapScanDesc scan, BlockNumber block, ScanDirection dir if (scan->rs_base.rs_flags & SO_ALLOW_SYNC) ss_report_location(scan->rs_base.rs_rd, block); + /* we're done if we're back at where we started */ + if (block == scan->rs_startblock) +return InvalidBlockNumber; + + /* check if the limit imposed by heap_setscanlimits() is met */ + if (scan->rs_numblocks != InvalidBlockNumber) + { +if (--scan->rs_numblocks == 0) + return InvalidBlockNumber; + } + return block; } else
Re: Prevent psql \watch from running queries that return no rows
> On 22 Aug 2023, at 23:23, Greg Sabino Mullane wrote: > > Thank you for the feedback, everyone. Attached is version 4 of the patch, > featuring a few tests and minor rewordings. I went over this once more, and pushed it along with pgindenting. I did reduce the number of tests since they were overlapping and a bit too expensive to have multiples of. Thanks for the patch! -- Daniel Gustafsson
Re: [PATCH v1] PQputCopyEnd never returns 0, fix the inaccurate comment
On Tue, Aug 29, 2023 at 6:40 AM Michael Paquier wrote: > > On Mon, Aug 28, 2023 at 09:46:07PM +0800, Junwang Zhao wrote: > > Yeah, it makes sense to me, or maybe just `PQputCopyEnd(...) == -1`, > > let's wait for some other opinions. > > One can argue that PQputCopyEnd() returning 0 could be possible in an > older version of libpq these callers are linking to, but this has > never existed from what I can see (just checked down to 8.2 now). > Anyway, changing these callers may create some backpatching conflicts, > so I'd let them as they are, and just fix the comment. sure, thanks for the explanation. > -- > Michael -- Regards Junwang Zhao
RE: logical_replication_mode
On Tuesday, August 29, 2023 3:26 PM Peter Smith wrote: Thanks for reviewing. > 2. DebugLogicalRepStreamingMode > > -/* possible values for logical_replication_mode */ > +/* possible values for debug_logical_replication_streaming */ > typedef enum > { > - LOGICAL_REP_MODE_BUFFERED, > - LOGICAL_REP_MODE_IMMEDIATE > -} LogicalRepMode; > + DEBUG_LOGICAL_REP_STREAMING_BUFFERED, > + DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE > +} DebugLogicalRepStreamingMode; > > Shouldn't this typedef name be included in the typedef.list file? I think it's unnecessary to add this as there currently is no reference to the name. See other similar examples like DebugParallelMode, RecoveryPrefetchValue ... And the name is also not included in BF[1] [1] https://buildfarm.postgresql.org/cgi-bin/typedefs.pl?branch=HEAD Best Regards, Hou zj
Re: Support prepared statement invalidation when result types change
On Tue, Aug 29, 2023 at 12:51 AM Jelte Fennema wrote: > > Could you share the full set of commands that cause the reporting > issue? I don't think my changes should impact this reporting, so I'm > curious if this is a new issue, or an already existing one. I didn't apply your v2 patch. full set of commands: regression=# CREATE TEMP TABLE pcachetest AS SELECT * FROM int8_tbl; SELECT 5 regression=# PREPARE prepstmt2(bigint) AS SELECT * FROM pcachetest WHERE q1 = $1;' PREPARE regression=# alter table pcachetest rename q1 to x; ALTER TABLE regression=# EXECUTE prepstmt2(123); 2023-08-29 17:23:59.148 CST [1382074] ERROR: column "q1" does not exist at character 61 2023-08-29 17:23:59.148 CST [1382074] HINT: Perhaps you meant to reference the column "pcachetest.q2". 2023-08-29 17:23:59.148 CST [1382074] STATEMENT: EXECUTE prepstmt2(123); ERROR: column "q1" does not exist HINT: Perhaps you meant to reference the column "pcachetest.q2". --
Re: Standardize spelling of "power of two"
> On 29 Aug 2023, at 10:56, Kyotaro Horiguchi wrote: > pg_resetwal and initdb has an error message like this: > > msgid "argument of --wal-segsize must be a power of 2 between 1 and 1024" > > In other parts in the tree, however, we spell it as "power of two". I > think it would make sense to standardize the spelling for > consistency. See the attached. Agreed. While we have numerous "power of 2" these were the only ones in translated user-facing messages, so I've pushed this to master (it didn't seem worth disrupting translations for 16 as we are so close to wrapping it, if others disagree I can backpatch). -- Daniel Gustafsson
Re: Missing comments/docs about custom scan path
On Thu, Aug 3, 2023 at 6:01 PM Etsuro Fujita wrote: > On Mon, Jul 31, 2023 at 7:05 PM Etsuro Fujita wrote: > > While working on [1], I noticed $SUBJECT: Another thing I would like to propose is minor adjustments to the docs related to parallel query: A custom scan provider will typically add paths for a base relation by setting the following hook, which is called after the core code has generated all the access paths it can for the relation (except for Gather paths, which are made after this call so that they can use partial paths added by the hook): For clarity, I think "except for Gather paths" should be "except for Gather and Gather Merge paths". Although this hook function can be used to examine, modify, or remove paths generated by the core system, a custom scan provider will typically confine itself to generating CustomPath objects and adding them to rel using add_path. For clarity, I think "adding them to rel using add_path" should be eg, "adding them to rel using add_path, or using add_partial_path if they are partial paths". Attached is a patch for that. Best regards, Etsuro Fujita further-update-custom-scan-path-docs.patch Description: Binary data
Re: Eliminate redundant tuple visibility check in vacuum
Hi Melanie, On 8/29/23 01:49, Melanie Plageman wrote: While working on a set of patches to combine the freeze and visibility map WAL records into the prune record, I wrote the attached patches reusing the tuple visibility information collected in heap_page_prune() back in lazy_scan_prune(). heap_page_prune() collects the HTSV_Result for every tuple on a page and saves it in an array used by heap_prune_chain(). If we make that array available to lazy_scan_prune(), it can use it when collecting stats for vacuum and determining whether or not to freeze tuples. This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in the page. It also gets rid of the retry loop in lazy_scan_prune(). How did you test this change? Could you measure any performance difference? If so could you provide your test case? -- David Geier (ServiceNow)
Standardize spelling of "power of two"
Hello. pg_resetwal and initdb has an error message like this: msgid "argument of --wal-segsize must be a power of 2 between 1 and 1024" In other parts in the tree, however, we spell it as "power of two". I think it would make sense to standardize the spelling for consistency. See the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index c66467eb95..905b979947 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -3350,7 +3350,7 @@ main(int argc, char *argv[]) check_need_password(authmethodlocal, authmethodhost); if (!IsValidWalSegSize(wal_segment_size_mb * 1024 * 1024)) - pg_fatal("argument of %s must be a power of 2 between 1 and 1024", "--wal-segsize"); + pg_fatal("argument of %s must be a power of two between 1 and 1024", "--wal-segsize"); get_restricted_token(); diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index 9bebc2a995..25ecdaaa15 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -298,7 +298,7 @@ main(int argc, char *argv[]) exit(1); set_wal_segsize = wal_segsize_mb * 1024 * 1024; if (!IsValidWalSegSize(set_wal_segsize)) - pg_fatal("argument of %s must be a power of 2 between 1 and 1024", "--wal-segsize"); + pg_fatal("argument of %s must be a power of two between 1 and 1024", "--wal-segsize"); break; }
Re: persist logical slots to disk during shutdown checkpoint
On Tue, Aug 29, 2023 at 10:16 AM vignesh C wrote: > > That makes sense. The attached v6 version has the changes for the > same, apart from this I have also fixed a) pgindent issues b) perltidy > issues c) one variable change (flush_lsn_changed to > confirmed_flush_has_changed) d) corrected few comments in the test > file. Thanks to Peter for providing few offline comments. > The latest version looks good to me. Julien, Ashutosh, and others, unless you have more comments or suggestions, I would like to push this in a day or two. -- With Regards, Amit Kapila.
Re: Remove IndexInfo.ii_OpclassOptions field
On 25.08.23 03:31, Michael Paquier wrote: On Thu, Aug 24, 2023 at 08:57:58AM +0200, Peter Eisentraut wrote: During some refactoring I noticed that the field IndexInfo.ii_OpclassOptions is kind of useless. The IndexInfo struct is notionally an executor support node, but this field is not used in the executor or by the index AM code. It is really just used in DDL code in index.c and indexcmds.c to pass information around locally. For that, it would be clearer to just use local variables, like for other similar cases. With that change, we can also remove RelationGetIndexRawAttOptions(), which only had one caller left, for which it was overkill. I am not so sure. There is a very recent thread where it has been pointed out that we have zero support for relcache invalidation with index options, causing various problems: https://www.postgresql.org/message-id/CAGem3qAM7M7B3DdccpgepRxuoKPd2Y74qJ5NSNRjLiN21dPhgg%40mail.gmail.com Perhaps we'd better settle on the other one before deciding if the change you are proposing here is adapted or not. Ok, I'll wait for the resolution of that. At a glance, however, I think my patch is (a) not related, and (b) if it were, it would probably *help*, because the change is to not allocate any long-lived structures that no one needs and that might get out of date.
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Hi, Thanks for the detailed explanation! On Mon, Aug 21, 2023 at 10:34 PM Önder Kalacı wrote: >> As described in the commit message, we assume that extensions use the >> hook in a similar way to FDWs > I'm not sure if it is fair to assume that extensions use any hook in any way. I am not sure either, but as for the hook, I think it is an undeniable fact that the core system assumes that extensions will use it in that way. >> So my question is: does the Citus extension use the hook like this? >> (Sorry, I do not fully understand Onder's explanation.) > I haven't gone into detail about how Citus uses this hook, but I don't think > we should > need to explain it. In general, Citus uses many hooks, and many other > extensions > use this specific hook. With minor version upgrades, we haven't seen this > kind of > behavior change before. > > In general, Citus relies on this hook for collecting information about joins > across > relations/ctes/subqueries. So, its scope is bigger than a single join for > Citus. > > The extension assigns a special marker(s) for RTE Relations, and then checks > whether > all relations with these special markers joined transitively across > subqueries, such that > it can decide to pushdown the whole or some parts of the (sub)query. IIUC, I think that that is going beyond what the hook supports. > But the bigger issue is that there has usually been a clear line between the > extensions and > the PG itself when it comes to hooks within the minor version upgrades. > Sadly, this change > breaks that line. We wanted to share our worries here and find out what > others think. My understanding is: at least for hooks with intended usages, if an extension uses them as intended, it is guaranteed that the extension as-is will work correctly with minor version upgrades; otherwise it is not necessarily. I think it is unfortunate that my commit broke the Citus extension, though. >> >Except that this was only noticed after it was released in a set of minor >> > versions, I would say that 6f80a8d9c should just straight up be reverted. > I cannot be the one to ask for reverting a commit in PG, but I think doing it > would be a > fair action. We kindly ask those who handle this to think about it. Reverting the commit would resolve your issue, but re-introduce the issue mentioned upthread to extensions that use the hook properly, so I do not think that reverting the commit would be a fair action. Sorry for the delay. Best regards, Etsuro Fujita
Re: tablecmds.c/MergeAttributes() cleanup
On 12.07.23 16:29, Peter Eisentraut wrote: On 11.07.23 20:17, Alvaro Herrera wrote: I spent a few minutes doing a test merge of this to my branch with NOT NULL changes. Here's a quick review. Subject: [PATCH 01/17] Remove obsolete comment about OID support Obvious, trivial. +1 Subject: [PATCH 02/17] Remove ancient special case code for adding oid columns LGTM; deletes dead code. Subject: [PATCH 03/17] Remove ancient special case code for dropping oid columns Hmm, interesting. Yay for more dead code removal. Didn't verify it. I have committed these first three. I'll leave it at that for now. I have committed a few more patches from this series that were already agreed upon. The remaining ones are rebased and reordered a bit, attached. There was some doubt about the patch "Refactor ATExecAddColumn() to use BuildDescForRelation()" (now 0009), whether it's too clever to build a fake one-item tuple descriptor. I am working on another patch, which I hope to post this week, that proposes to replace the use of tuple descriptors there with a List of something. That would then allow maybe rewriting this in a less-clever way. That patch incidentally also wants to move BuildDescForRelation from tupdesc.c to tablecmds.c (patch 0007 here). From 4d01d244305c63a5a602558267d0021910e200b6 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Jul 2023 16:12:34 +0200 Subject: [PATCH v2 01/10] Clean up MergeAttributesIntoExisting() Make variable naming clearer and more consistent. Move some variables to smaller scope. Remove some unnecessary intermediate variables. Try to save some vertical space. Apply analogous changes to nearby MergeConstraintsIntoExisting() and RemoveInheritance() for consistency. --- src/backend/commands/tablecmds.c | 123 --- 1 file changed, 48 insertions(+), 75 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index d097da3c78..e7846178b3 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15691,92 +15691,76 @@ static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel) { Relationattrrel; - AttrNumber parent_attno; - int parent_natts; - TupleDesc tupleDesc; - HeapTuple tuple; - boolchild_is_partition = false; + TupleDesc parent_desc; attrrel = table_open(AttributeRelationId, RowExclusiveLock); + parent_desc = RelationGetDescr(parent_rel); - tupleDesc = RelationGetDescr(parent_rel); - parent_natts = tupleDesc->natts; - - /* If parent_rel is a partitioned table, child_rel must be a partition */ - if (parent_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - child_is_partition = true; - - for (parent_attno = 1; parent_attno <= parent_natts; parent_attno++) + for (AttrNumber parent_attno = 1; parent_attno <= parent_desc->natts; parent_attno++) { - Form_pg_attribute attribute = TupleDescAttr(tupleDesc, - parent_attno - 1); - char *attributeName = NameStr(attribute->attname); + Form_pg_attribute parent_att = TupleDescAttr(parent_desc, parent_attno - 1); + char *parent_attname = NameStr(parent_att->attname); + HeapTuple tuple; /* Ignore dropped columns in the parent. */ - if (attribute->attisdropped) + if (parent_att->attisdropped) continue; /* Find same column in child (matching on column name). */ - tuple = SearchSysCacheCopyAttName(RelationGetRelid(child_rel), - attributeName); + tuple = SearchSysCacheCopyAttName(RelationGetRelid(child_rel), parent_attname); if (HeapTupleIsValid(tuple)) { - /* Check they are same type, typmod, and collation */ - Form_pg_attribute childatt = (Form_pg_attribute) GETSTRUCT(tuple); + Form_pg_attribute child_att = (Form_pg_attribute) GETSTRUCT(tuple); - if (attribute->atttypid != childatt->atttypid || - attribute->atttypmod != childatt->atttypmod) + if (parent_att->atttypid != child_att->atttypid || + parent_att->atttypmod != child_att->atttypmod) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("child table \"%s\" has different type for column \"%s\"", -
Re: logical_replication_mode
On Tue, Aug 29, 2023 at 12:56 PM Peter Smith wrote: > > I had a look at the patch 0001. > > It looks OK to me, but here are a couple of comments: > > == > > 1. Is this fix intended for PG16? > Yes. > I found some mention of this GUC old name lurking in the release v16 notes > [1]. > That should be changed as well but we can do that as a separate patch just for v16. -- With Regards, Amit Kapila.
Re: Wrong usage of pqMsg_Close message code?
On Tue, Aug 29, 2023 at 10:04:24AM +0900, Tatsuo Ishii wrote: >> Yeah, both of you are right here. Anyway, it seems to me that there >> is a bit more going on in protocol.h. I have noticed two more things >> that are incorrect: >> - HandleParallelMessage is missing a message for 'P', but I think that >> we should have a code for it as well as part of the parallel query >> protocol. > > I did not know this. Why is this not explained in the frontend/backend > protocol document? Hmm. Thinking more about it, I am actually not sure that we need to do that in this case, so perhaps things are OK as they stand for this one. >> - PqMsg_Terminate can be sent by the frontend *and* the backend, see >> fe-connect.c and parallel.c. However, protocol.h documents it as a >> frontend-only code. > > I do not blame protocol.h because our frontend/backend protocol > document also stats that it's a frontend only message. Someone who > started to use 'X' in backend should have added that in the > documentation. Actually, this may be OK as well as it stands. One can also say that the parallel processing is out of this scope, being used only internally. I cannot keep wondering whether we should put more efforts in documenting the parallel worker/leader protocol. That's internal to the backend and out of the scope of this thread, still.. -- Michael signature.asc Description: PGP signature
Re: logical_replication_mode
Hi Hou-san. I had a look at the patch 0001. It looks OK to me, but here are a couple of comments: == 1. Is this fix intended for PG16? I found some mention of this GUC old name lurking in the release v16 notes [1]. ~~~ 2. DebugLogicalRepStreamingMode -/* possible values for logical_replication_mode */ +/* possible values for debug_logical_replication_streaming */ typedef enum { - LOGICAL_REP_MODE_BUFFERED, - LOGICAL_REP_MODE_IMMEDIATE -} LogicalRepMode; + DEBUG_LOGICAL_REP_STREAMING_BUFFERED, + DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE +} DebugLogicalRepStreamingMode; Shouldn't this typedef name be included in the typedef.list file? -- [1] https://www.postgresql.org/docs/16/release-16.html Kind Regards, Peter Smith. Fujitsu Australia
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
On 29/08/2023 09:21, Heikki Linnakangas wrote: Thinking about this some more, the ListenSockets array is a bit silly anyway. We fill the array starting from index 0, always append to the end, and never remove entries from it. It would seem more straightforward to keep track of the used size of the array. Currently we always loop through the unused parts too, and e.g. ConfigurePostmasterWaitSet() needs to walk the array to count how many elements are in use. Like this. -- Heikki Linnakangas Neon (https://neon.tech) From 796280f07dd5dbf50897c9895715ab5e2dbf187b Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 29 Aug 2023 09:53:08 +0300 Subject: [PATCH 1/1] Refactor ListenSocket array. Keep track of the used size of the array. That avoids looping through the whole array in a few places. It doesn't matter from a performance point of view since the array is small anyway, but this feels less surprising and is a little less code. Now that we have an explicit NumListenSockets variable that is statically initialized to 0, we don't need the loop to initialize the array. Allocate the array in PostmasterContext. The array isn't needed in child processes, so this allows reusing that memory. We could easily make the array resizable now, but we haven't heard any complaints about the current 64 sockets limit. --- src/backend/libpq/pqcomm.c | 18 +++ src/backend/postmaster/postmaster.c | 76 +++-- src/include/libpq/libpq.h | 2 +- 3 files changed, 36 insertions(+), 60 deletions(-) diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 8d217b0645..48ae7704fb 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -311,8 +311,9 @@ socket_close(int code, Datum arg) * specified. For TCP ports, hostName is either NULL for all interfaces or * the interface to listen on, and unixSocketDir is ignored (can be NULL). * - * Successfully opened sockets are added to the ListenSocket[] array (of - * length MaxListen), at the first position that isn't PGINVALID_SOCKET. + * Successfully opened sockets are appended to the ListenSockets[] array. The + * current size of the array is *NumListenSockets, it is updated to reflect + * the opened sockets. MaxListen is the allocated size of the array. * * RETURNS: STATUS_OK or STATUS_ERROR */ @@ -320,7 +321,7 @@ socket_close(int code, Datum arg) int StreamServerPort(int family, const char *hostName, unsigned short portNumber, const char *unixSocketDir, - pgsocket ListenSocket[], int MaxListen) + pgsocket ListenSockets[], int *NumListenSockets, int MaxListen) { pgsocket fd; int err; @@ -335,7 +336,6 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber, struct addrinfo *addrs = NULL, *addr; struct addrinfo hint; - int listen_index = 0; int added = 0; char unixSocketPath[MAXPGPATH]; #if !defined(WIN32) || defined(IPV6_V6ONLY) @@ -401,12 +401,7 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber, } /* See if there is still room to add 1 more socket. */ - for (; listen_index < MaxListen; listen_index++) - { - if (ListenSocket[listen_index] == PGINVALID_SOCKET) -break; - } - if (listen_index >= MaxListen) + if (*NumListenSockets == MaxListen) { ereport(LOG, (errmsg("could not bind to all requested addresses: MAXLISTEN (%d) exceeded", @@ -573,7 +568,8 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber, (errmsg("listening on %s address \"%s\", port %d", familyDesc, addrDesc, (int) portNumber))); - ListenSocket[listen_index] = fd; + ListenSockets[*NumListenSockets] = fd; + (*NumListenSockets)++; added++; } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7bfb28ff3..2659329b82 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -226,7 +226,8 @@ int ReservedConnections; /* The socket(s) we're listening to. */ #define MAXLISTEN 64 -static pgsocket ListenSocket[MAXLISTEN]; +static int NumListenSockets = 0; +static pgsocket *ListenSockets = NULL; /* still more option variables */ bool EnableSSL = false; @@ -587,7 +588,6 @@ PostmasterMain(int argc, char *argv[]) int status; char *userDoption = NULL; bool listen_addr_saved = false; - int i; char *output_config_variable = NULL; InitProcessGlobals(); @@ -1141,17 +1141,6 @@ PostmasterMain(int argc, char *argv[]) errmsg("could not remove file \"%s\": %m", LOG_METAINFO_DATAFILE))); - /* - * Initialize input sockets. - * - * Mark them all closed, and set up an on_proc_exit function that's - * charged with closing the sockets again at postmaster shutdown. - */ - for (i = 0; i < MAXLISTEN; i++) - ListenSocket[i] = PGINVALID_SOCKET; - - on_proc_exit(CloseServerPorts, 0); - /* * If enabled, start up syslogger collection
Re: Query execution in Perl TAP tests needs work
On Tue, Aug 29, 2023 at 1:48 PM Noah Misch wrote: > On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote: > > 1. Don't fork anything at all: open (and cache) a connection directly > > from Perl. > > 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the > > popular Perl xsub library? > > 1b. Write our own mini pure-perl pq client module. Or vendor (parts) > > of some existing one. > > 2. Use long-lived psql sessions. > > 2a. Something building on BackgroundPsql. > > 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe > > protocol that is more fun to talk to from Perl/machines? > > (2a) seems adequate and easiest to achieve. If DBD::Pg were under a > compatible license, I'd say use it as the vendor for (1a). Maybe we can get > it relicensed? Building a separate way of connecting from Perl would be sad. Here's my minimal POC of 2a. It only changes ->safe_psql() and not the various other things like ->psql() and ->poll_query_until(). Hence use of amcheck/001_verify_heapam as an example: it runs a lot of safe_psql() queries. It fails in all kinds of ways if enabled generally, which would take some investigation (some tests require there to be no connections at various times, so we'd probably need to insert disable/re-enable calls at various places). From 0f6f4b1bdc69a10b1048731d5b1d744567978cce Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 29 Aug 2023 18:12:07 +1200 Subject: [PATCH] XXX cache psql processes diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl index 46d5b53181..316e876c5d 100644 --- a/contrib/amcheck/t/001_verify_heapam.pl +++ b/contrib/amcheck/t/001_verify_heapam.pl @@ -18,6 +18,7 @@ $node = PostgreSQL::Test::Cluster->new('test'); $node->init; $node->append_conf('postgresql.conf', 'autovacuum=off'); $node->start; +$node->enable_auto_background_psql(1); $node->safe_psql('postgres', q(CREATE EXTENSION amcheck)); # diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm index 924b57ab21..53ee69c801 100644 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm +++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm @@ -221,7 +221,7 @@ sub query $output = $self->{stdout}; # remove banner again, our caller doesn't care - $output =~ s/\n$banner$//s; + $output =~ s/\n?$banner$//s; # clear out output for the next query $self->{stdout} = ''; diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 227c34ab4d..f53844e25e 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -986,6 +986,8 @@ sub stop local %ENV = $self->_get_env(); + $self->_close_auto_background_psqls; + $mode = 'fast' unless defined $mode; return 1 unless defined $self->{_pid}; @@ -1025,6 +1027,8 @@ sub reload local %ENV = $self->_get_env(); + $self->_close_auto_background_psqls; + print "### Reloading node \"$name\"\n"; PostgreSQL::Test::Utils::system_or_bail('pg_ctl', '-D', $pgdata, 'reload'); @@ -1049,6 +1053,8 @@ sub restart local %ENV = $self->_get_env(PGAPPNAME => undef); + $self->_close_auto_background_psqls; + print "### Restarting node \"$name\"\n"; # -w is now the default but having it here does no harm and helps @@ -1349,7 +1355,9 @@ sub new _logfile_base => "$PostgreSQL::Test::Utils::log_path/${testname}_${name}", _logfile => - "$PostgreSQL::Test::Utils::log_path/${testname}_${name}.log" + "$PostgreSQL::Test::Utils::log_path/${testname}_${name}.log", + _auto_background_psqls => {}, + _use_auto_background_psqls => 0 }; if ($params{install_path}) @@ -1508,6 +1516,39 @@ sub _get_env return (%inst_env); } +# Private routine to close and forget any psql processes we may be +# holding onto. +sub _close_auto_background_psqls +{ + my ($self) = @_; + foreach my $dbname (keys %{$self->{_auto_background_psqls}}) + { + my $psql = $self->{_auto_background_psqls}{$dbname}; + delete $self->{_auto_background_psqls}{$dbname}; + $psql->quit; + } +} + +=pod + +=item enable_auto_background_psql() + +Enable or disable the automatic reuse of long-lived psql sessions. + +=cut + +sub enable_auto_background_psql +{ + my $self = shift; + my $value = shift; + + $self->{_use_auto_background_psqls} = $value; + if (!$value) + { + $self->_close_auto_background_psqls; + } +} + # Private routine to get an installation path qualified command. # # IPC::Run maintains a cache, %cmd_cache, mapping commands to paths. Tests @@ -1744,13 +1785,32 @@ sub safe_psql my ($stdout, $stderr); - my $ret = $self->psql( - $dbname, $sql, - %params, - stdout => \$stdout, - stderr => \$stderr, - on_error_die => 1, - on_error_stop => 1); + + # If there are no special parameters, and re-use of sessions + # hasn't been disabled, create or re-use a long-lived psql + # process. + if (!%params &&
Re: Autogenerate some wait events code and documentation
On Tue, Aug 29, 2023 at 08:17:10AM +0200, Drouvot, Bertrand wrote: > Agree that done that way one could easily grep the events from the > source code and match them with wait_event_names.txt. Then I don't > think the "search" issue in the code is still a concern with the > current proposal. It could still be able to append WAIT_EVENT_ to the first column of the file. I'd just rather keep it shorter. > FWIW, I'm getting: > > $ git am v3-000* > Applying: Rename wait events with more consistent camelcase style > Applying: Remove column for wait event names in wait_event_names.txt > error: patch failed: src/backend/utils/activity/wait_event_names.txt:261 > error: src/backend/utils/activity/wait_event_names.txt: patch does not apply > Patch failed at 0002 Remove column for wait event names in > wait_event_names.txt That may be a bug in the matrix because of bb90022, as git am can be easily pissed. I am attaching a new patch series, but it does not seem to matter here. I have double-checked the docs generated, while on it, and I am not seeing anything missing, particularly for the LWLock and Lock parts.. -- Michael From 7e221db0fe26f9d36df90202934f4177daa86ff7 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 28 Aug 2023 14:47:13 +0900 Subject: [PATCH v4 1/2] Rename wait events with more consistent camelcase style This will help in the introduction of more simplifications with the generation of wait event data using wait_event_names.txt. The event names used the same case-insensitive characters, hence applying lower() or upper() to the monitoring queries allows the detection of the same events as before this change. --- src/backend/libpq/pqmq.c | 2 +- src/backend/storage/ipc/shm_mq.c | 6 +- .../utils/activity/wait_event_names.txt | 90 +-- 3 files changed, 49 insertions(+), 49 deletions(-) diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c index 253181f47a..38b042804c 100644 --- a/src/backend/libpq/pqmq.c +++ b/src/backend/libpq/pqmq.c @@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len) break; (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_PUT_MESSAGE); + WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); } diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c index d134a09a19..06d6b73527 100644 --- a/src/backend/storage/ipc/shm_mq.c +++ b/src/backend/storage/ipc/shm_mq.c @@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data, * cheaper than setting one that has been reset. */ (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_SEND); + WAIT_EVENT_MESSAGE_QUEUE_SEND); /* Reset the latch so we don't spin. */ ResetLatch(MyLatch); @@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait, * setting one that has been reset. */ (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_RECEIVE); + WAIT_EVENT_MESSAGE_QUEUE_RECEIVE); /* Reset the latch so we don't spin. */ ResetLatch(MyLatch); @@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, BackgroundWorkerHandle *handle) /* Wait to be signaled. */ (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, - WAIT_EVENT_MQ_INTERNAL); + WAIT_EVENT_MESSAGE_QUEUE_INTERNAL); /* Reset the latch so we don't spin. */ ResetLatch(MyLatch); diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 13774254d2..0cace8f40a 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -45,15 +45,15 @@ Section: ClassName - WaitEventActivity WAIT_EVENT_ARCHIVER_MAIN ArchiverMain "Waiting in main loop of archiver process." -WAIT_EVENT_AUTOVACUUM_MAIN AutoVacuumMain "Waiting in main loop of autovacuum launcher process." -WAIT_EVENT_BGWRITER_HIBERNATE BgWriterHibernate "Waiting in background writer process, hibernating." -WAIT_EVENT_BGWRITER_MAIN BgWriterMain "Waiting in main loop of background writer process." +WAIT_EVENT_AUTOVACUUM_MAIN AutovacuumMain "Waiting in main loop of autovacuum launcher process." +WAIT_EVENT_BGWRITER_HIBERNATE BgwriterHibernate "Waiting in background writer process, hibernating." +WAIT_EVENT_BGWRITER_MAIN BgwriterMain "Waiting in main loop of background writer process." WAIT_EVENT_CHECKPOINTER_MAIN CheckpointerMain "Waiting in main loop of checkpointer process." WAIT_EVENT_LOGICAL_APPLY_MAIN LogicalApplyMain "Waiting in main loop of logical replication apply process." WAIT_EVENT_LOGICAL_LAUNCHER_MAIN LogicalLauncherMain "Waiting in main loop of logical replication launcher process." WAIT_EVENT_LOGICAL_PARALLEL_APPLY_MAIN LogicalParallelApplyMain "Waiting in
Re: Logger process and "LOG: could not close client or listen socket: Bad file descriptor"
On Tue, Aug 29, 2023 at 09:27:48AM +0300, Heikki Linnakangas wrote: > Just to close the loop on this thread: I committed and backpatched Michael's > fix. Thanks! -- Michael signature.asc Description: PGP signature
Re: Logger process and "LOG: could not close client or listen socket: Bad file descriptor"
On 29/08/2023 06:18, Peter Geoghegan wrote: On Sun, Aug 27, 2023 at 5:52 PM Michael Paquier wrote: From what I can see, this is is a rather old issue, because ListenSocket[] is filled with PGINVALID_SOCKET *after* starting the syslogger. It seems to me that we should just initialize the array before starting the syslogger, so as we don't get these incorrect logs? Thoughts? Please see the attached. Agreed, this is very annoying. I'm going to start using your patch with the feature branch I'm working on. Hopefully that won't be necessary for too much longer. Just to close the loop on this thread: I committed and backpatched Michael's fix. Discussion on other thread at https://www.postgresql.org/message-id/9caed67f-f93e-5701-8c25-265a2b139ed0%40iki.fi. -- Heikki Linnakangas Neon (https://neon.tech)
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Mon, Aug 28, 2023 at 1:01 PM Peter Smith wrote: > > Hi, here are my review comments for v26-0003 > > It seems I must defend some of my previous suggestions from v25* [1], > so here goes... > > == > src/bin/pg_upgrade/check.c > > 1. check_and_dump_old_cluster > > CURRENT CODE (with v26-0003 patch applied) > > /* Extract a list of logical replication slots */ > get_old_cluster_logical_slot_infos(); > > ... > > /* > * Logical replication slots can be migrated since PG17. See comments atop > * get_old_cluster_logical_slot_infos(). > */ > if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) > { > check_old_cluster_for_lost_slots(); > > /* > * Do additional checks if a live check is not required. This requires > * that confirmed_flush_lsn of all the slots is the same as the latest > * checkpoint location, but it would be satisfied only when the server > * has been shut down. > */ > if (!live_check) > check_old_cluster_for_confirmed_flush_lsn(); > } > > > SUGGESTION > > /* > * Logical replication slots can be migrated since PG17. See comments atop > * get_old_cluster_logical_slot_infos(). > */ > if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a. > { > /* Extract a list of logical replication slots */ > get_old_cluster_logical_slot_infos(); > > if (count_old_cluster_slots()) // NOTE 1b. > { > check_old_cluster_for_lost_slots(); > > /* > * Do additional checks if a live check is not required. This requires > * that confirmed_flush_lsn of all the slots is the same as the latest > * checkpoint location, but it would be satisfied only when the server > * has been shut down. > */ > if (!live_check) > check_old_cluster_for_confirmed_flush_lsn(); > } > } > I think a slightly better way to achieve this is to combine the code from check_old_cluster_for_lost_slots() and check_old_cluster_for_confirmed_flush_lsn() into check_old_cluster_for_valid_slots(). That will even save us a new connection for the second check. Also, I think we can simplify another check in the patch: @@ -1446,8 +1446,10 @@ check_new_cluster_logical_replication_slots(void) char *wal_level; /* Logical slots can be migrated since PG17. */ - if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) - nslots = count_old_cluster_logical_slots(); + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) + return; + + nslots = count_old_cluster_logical_slots(); -- With Regards, Amit Kapila.
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
On 29/08/2023 01:28, Michael Paquier wrote: In case you've not noticed: https://www.postgresql.org/message-id/zovvuqe0rdj2s...@paquier.xyz But it does not really matter now ;) Ah sorry, missed that thread. Yes, so it seems. Syslogger is started before the ListenSockets array is initialized, so its still all zeros. When syslogger is forked, the child process tries to close all the listen sockets, which are all zeros. So syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the array initialization earlier. Yep, I've reached the same conclusion. Wouldn't it be cleaner to move the callback registration of CloseServerPorts() closer to the array initialization, though? Ok, pushed that way. I checked the history of this: it goes back to commit 9a86f03b4e in version 13. The SysLogger_Start() call used to be later, after setting p ListenSockets, but that commit moved it. So I backpatched this to v13. The first close(0) actually does have an effect: it closes stdin, which is fd 0. That is surely accidental, but I wonder if we should indeed close stdin in child processes? The postmaster process doesn't do anything with stdin either, although I guess a library might try to read a passphrase from stdin before starting up, for example. We would have heard about that, wouldn't we? I may be missing something of course, but on HEAD, the array initialization is done before starting any child processes, and the syslogger is the first one forked. Yes, syslogger is the only process that closes stdin. After moving the initialization, it doesn't close it either. Thinking about this some more, the ListenSockets array is a bit silly anyway. We fill the array starting from index 0, always append to the end, and never remove entries from it. It would seem more straightforward to keep track of the used size of the array. Currently we always loop through the unused parts too, and e.g. ConfigurePostmasterWaitSet() needs to walk the array to count how many elements are in use. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Autogenerate some wait events code and documentation
Hi, On 8/28/23 10:04 AM, Michael Paquier wrote: On Mon, Jul 17, 2023 at 10:16:02AM +0900, Michael Paquier wrote: So you mean to switch a line that now looks like that: WAIT_EVENT_FOO_BAR FooBar"Waiting on Foo Bar." To that: FOO_BAR "Waiting on Foo Bar." Or even that: WAIT_EVENT_FOO_BAR "Waiting on Foo Bar." Sure, it is an improvement for any wait events that use WAIT_EVENT_ when searching them, but this adds more magic into the LWLock and Lock areas if the same conversion is applied there. Or am I right to assume that you'd mean to *not* do any of that for these two classes? These can be treated as exceptions in the script when generating the wait event names from the enum elements, of course. I have looked again at that, and switching wait_event_names.txt to use two columns made of the typedef definitions and the docs like is not a problem: FOO_BAR "Waiting on Foo Bar." WAIT_EVENT_ is appended to the typedef definitions in the script. The wait event names like "FooBar" are generated from the enums by splitting using their underscores and doing some lc(). Lock and LWLock don't need to change. This way, it is easy to grep the wait events from the source code and match them with wait_event_names.txt. Thoughts or comments? Agree that done that way one could easily grep the events from the source code and match them with wait_event_names.txt. Then I don't think the "search" issue in the code is still a concern with the current proposal. FWIW, I'm getting: $ git am v3-000* Applying: Rename wait events with more consistent camelcase style Applying: Remove column for wait event names in wait_event_names.txt error: patch failed: src/backend/utils/activity/wait_event_names.txt:261 error: src/backend/utils/activity/wait_event_names.txt: patch does not apply Patch failed at 0002 Remove column for wait event names in wait_event_names.txt Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com