Re: Add new option 'all' to pg_stat_reset_shared()
On Wed, Nov 15, 2023 at 04:25:14PM +0900, Michael Paquier wrote: > Other than that, it looks OK. Tweaked the queries of this one slightly, and applied. So I think that we are now good for this thread. Thanks, all! -- Michael signature.asc Description: PGP signature
Re: Infinite Interval
On Tue, Nov 14, 2023 at 4:39 PM Dean Rasheed wrote: > > On Thu, 9 Nov 2023 at 12:49, Dean Rasheed wrote: > > > > OK, I have pushed 0001 and 0002. Here's the remaining (main) patch. > > > > OK, I have now pushed the main patch. Thanks a lot Dean. -- Best Wishes, Ashutosh Bapat
Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro
On Wed, Nov 15, 2023 at 9:26 PM Nathan Bossart wrote: > On Wed, Nov 15, 2023 at 09:27:18AM +0530, Amul Sul wrote: > > Nevermind, I usually use git apply or git am, here are those errors: > > > > PG/ - (master) $ git apply > ~/Downloads/retire_compatibility_macro_v1.patch > > error: patch failed: src/backend/access/brin/brin.c:297 > > error: src/backend/access/brin/brin.c: patch does not apply > > I wonder if your mail client is modifying the patch. Do you have the same > issue if you download it from the archives? > Yes, you are correct. Surprisingly, the archive version applied cleanly. Gmail is doing something, I usually use web login on chrome browser, I never faced such issues with other's patches. Anyway, will try both the versions next time for the same kind of issue, sorry for the noise. Regards, Amul
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Wed, Nov 15, 2023 at 12:21:40PM +0100, Alvaro Herrera wrote: > On 2023-Nov-15, Michael Paquier wrote: > Oh, I think you're overthinking what my comment was. I was saying, just > name it "InjectionPointsHash". Since there appears to be no room for > another hash table for injection points, then there's no need to specify > that this one is the ByName hash. I couldn't think of any other way to > organize the injection points either. Aha, OK. No problem, this was itching me as well but I didn't see an argument with changing these names, so I've renamed things a bit more. >> Yes, still not something that's required in the core APIs or an >> initial batch. > > I agree that we can do the easy thing first and build it up later. I > just hope we don't get too wedded on the current interface because of > lack of time in the current release that we get stuck with it. One thing that I assume we will need with more advanced testing is the possibility to pass down one (for a structure of data) or more arguments to the callback a point is attached to. For that, it is possible to add more macros, like INJECTION_POINT_1ARG, INJECTION_POINT_ARG(), etc. that use some (void *) pointers. It would be possible to add that even in stable branches, as need arises, changing the structure of the shmem hash table if required to control the way a callback is run. At the end, I suspect that it is one of these things where we'll need to adapt depending on what people want to do with this stuff. FWIW, I can do already quite a bit with this minimalistic design and an external extension. Custom wait events are also very handy to monitor a wait. >> I am not sure that it is a good idea to enforce a specific conditional >> logic in the backend core code. > > Agreed, let's get more experience on what other types of tests people > want to build, and how are things going to interact with each other. I've worked more on polishing the core facility today, with 0001 introducing the backend-side facility. One thing that I mentioned lacking is a local cache for processes so as they don't load an external callback more than once if run. So I have added this local cache. When a point is scanned but not found, a previous cache entry is cleared if any (this leaks but we don't have a way to unload stuff, and for testing that's not a big deal). I've renamed the routines to use attach and detach as terms, and adopted the name you've suggested for the macro. The names around the hash table and its entries have been changed to what you've suggested. You are right, that's more intuitive. 0002 is the test module for the basics, split into its own patch, with a couple of tests for the local cache part. 0003 and 0004 have been adjusted with the new SQL functions. At the end, I'd like to propose 0004 as it's been a PITA for me and I don't want to break this case again. It needs more work and can be cheaper. One more argument in favor of it is the addition of condition variables to wait and wake points (perhaps with even more variables?) in the test module. If there is interest for 0003, I'm OK to work more on it as well, but that's less important IMV. Thoughts and comments are welcome, with a v4 series attached. -- Michael From 8febfda427ae773dd3c4c66ca9c932b3fe4cbc10 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 16 Nov 2023 14:41:41 +0900 Subject: [PATCH v4 1/4] Add backend facility for injection points This adds a set of routines allowing developers to attach, detach and run custom code based on arbitrary code paths set with a centralized macro called INJECTION_POINT(). Injection points are registered in a shared hash table. Processes also use a local cache to over loading callbacks more than necessary, cleaning up their cache if a callback has found to be removed. --- src/include/pg_config.h.in| 3 + src/include/utils/injection_point.h | 36 ++ src/backend/storage/ipc/ipci.c| 3 + src/backend/storage/lmgr/lwlocknames.txt | 1 + .../utils/activity/wait_event_names.txt | 1 + src/backend/utils/misc/Makefile | 1 + src/backend/utils/misc/injection_point.c | 349 ++ src/backend/utils/misc/meson.build| 1 + doc/src/sgml/installation.sgml| 30 ++ doc/src/sgml/xfunc.sgml | 56 +++ configure | 34 ++ configure.ac | 7 + meson.build | 1 + meson_options.txt | 3 + src/Makefile.global.in| 1 + src/tools/pgindent/typedefs.list | 2 + 16 files changed, 529 insertions(+) create mode 100644 src/include/utils/injection_point.h create mode 100644 src/backend/utils/misc/injection_point.c diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index d8a2985567..7a976821e5 100644 ---
Re: Synchronizing slots from primary to standby
On Tue, Nov 14, 2023 at 7:56 PM Drouvot, Bertrand wrote: > > Hi, > > On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote: > > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand > > wrote: > >> Yeah good point, agree to just error out in all the case then (if we > >> discard the > >> sync_ reserved wording proposal, which seems to be the case as probably not > >> worth the extra work). > > > > Thanks for the discussion! > > > > Here is the V33 patch set which includes the following changes: > > Thanks for working on it! > > > > > 1) Drop slots with state 'i' in promotion flow after we shut down > > WalReceiver. > > @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool > randAccess, > * this only after failure, so when you promote, we still > * finish replaying as much as we can from archive and > * pg_wal before failover. > +* > +* Drop the slots for which sync is initiated but not yet > +* completed i.e. they are still waiting for the primary > +* server to catch up. > */ > if (StandbyMode && CheckForStandbyTrigger()) > { > XLogShutdownWalRcv(); > + slotsync_drop_initiated_slots(); > return XLREAD_FAIL; > } > > I had a closer look and it seems this is not located at the right place. > > Indeed, it's added here: > > switch (currentSource) > { > case XLOG_FROM_ARCHIVE: > case XLOG_FROM_PG_WAL: > > While in our case we are in > > case XLOG_FROM_STREAM: > > So I think we should move slotsync_drop_initiated_slots() in the > XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker? > (the TODO item number 2 you mentioned up-thread) > > BTW in order to prevent any corner case, would'nt also be better to > > replace: > > + /* > +* Do not allow consumption of a "synchronized" slot until the standby > +* gets promoted. > +*/ > + if (RecoveryInProgress() && (slot->data.sync_state != > SYNCSLOT_STATE_NONE)) > > with something like: > > if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) > || slot->data.sync_state == SYNCSLOT_STATE_INITIATED) > > to ensure slots in 'i' case can never be used? > Yes, it makes sense. WIll do it. > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com
RE: Random pg_upgrade test failure on drongo
Dear hackers, > While tracking a buildfarm, I found that drongo failed the test > pg_upgrade/003_logical_slots [1]. > A strange point is that the test passed in the next iteration. Currently I'm > not > sure the reason, but I will keep my eye for it and will investigate if it > happens again. This email just tells an update. We found that fairywren was also failed due to the same reason [2]. It fails inconsistently, but there might be a bad thing on windows. I'm now trying to reproduce with my colleagues to analyze more detail. Also, working with Andrew for getting logs emitted during the upgrade. I will continue to keep on my eye. [2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-11-08%2010%3A22%3A45 Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: remaining sql/json patches
Hi Erik, On Thu, Nov 16, 2023 at 13:52 Erik Rijkers wrote: > Op 11/15/23 om 14:00 schreef Amit Langote: > > Hi, > > [..] > > > Attached updated patch. The version of 0001 that I posted on Oct 11 > > to add the error-safe version of CoerceViaIO contained many > > unnecessary bits that are now removed. > > > > -- > > Thanks, Amit Langote > > EDB: http://www.enterprisedb.com > > > [v24-0001-Add-soft-error-handling-to-some-expression-nodes.patch] > > [v24-0002-Add-soft-error-handling-to-populate_record_field.patch] > > [v24-0003-SQL-JSON-query-functions.patch] > > [v24-0004-JSON_TABLE.patch] > > [v24-0005-Claim-SQL-standard-compliance-for-SQL-JSON-featu.patch] > > Hi Amit, > > Here is a statement that seems to gobble up all memory and to totally > lock up the machine. No ctrl-C - only a power reset gets me out of that. > It was in one of my tests, so it used to work: > > select json_query( > jsonb '"[3,4]"' >, '$[*]' returning bigint[] empty object on error > ); > > Can you have a look? Wow, will look. Thanks. >
Re: remaining sql/json patches
Op 11/15/23 om 14:00 schreef Amit Langote: Hi, [..] Attached updated patch. The version of 0001 that I posted on Oct 11 to add the error-safe version of CoerceViaIO contained many unnecessary bits that are now removed. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com > [v24-0001-Add-soft-error-handling-to-some-expression-nodes.patch] > [v24-0002-Add-soft-error-handling-to-populate_record_field.patch] > [v24-0003-SQL-JSON-query-functions.patch] > [v24-0004-JSON_TABLE.patch] > [v24-0005-Claim-SQL-standard-compliance-for-SQL-JSON-featu.patch] Hi Amit, Here is a statement that seems to gobble up all memory and to totally lock up the machine. No ctrl-C - only a power reset gets me out of that. It was in one of my tests, so it used to work: select json_query( jsonb '"[3,4]"' , '$[*]' returning bigint[] empty object on error ); Can you have a look? Thanks, Erik
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 12:18 PM Amit Kapila wrote: > > On Thu, Nov 16, 2023 at 3:48 AM Peter Smith wrote: > > > > ~ > > > > SUGGESTION (#1a and #1b) > > > > ereport(log_replication_commands ? LOG : DEBUG1, > > errmsg(SlotIsLogical(s) > >? "acquired logical replication slot \"%s\"" > >: "acquired physical replication slot \"%s\"", > >NameStr(s->data.name))); > > > > ~~~ > > > > Personally, I prefer the way Bharath had in his patch. Did you see any > preferred way in the existing code? Not really. I think the errmsg combined with ternary is not so common. I couldn't find many examples, so I wouldn't try to claim anything is a "preferred" way There are some existing examples, like Bharath had: ereport(NOTICE, (errcode(ERRCODE_DUPLICATE_OBJECT), collencoding == -1 ? errmsg("collation \"%s\" already exists, skipping", collname) : errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping", collname, pg_encoding_to_char(collencoding; OTOH, when there are different numbers of substitution parameters in each of the errmsg like that, you don't have much choice but to do it that way. I am fine with whatever is chosen -- I was only offering an alternative. == Kind Regards, Peter Smith. Fujitsu Australia
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 12:36 PM Amit Kapila wrote: > > On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera > wrote: > > > > Translation-wise, this doesn't work, because you're building a string. > > There's no reason to think that the words "logical" and "physical" > > should stay untranslated; the message would make no sense, or at least > > would be very ugly. > > > > You should do something like > > > > if (am_walsender) > > { > > ereport(log_replication_commands ? LOG : DEBUG1, > > SlotIsLogical(s) ? errmsg("acquired logical replication > > slot \"%s\"", NameStr(s->data.name)) : > > errmsg("acquired physical replication slot \"%s\"", > > NameStr(s->data.name))); > > } > > > > (Obviously, lose the "translator:" comments since they are unnecessary) > > > > > > If you really want to keep the "logical"/"physical" word untranslated, > > you need to split it out of the sentence somehow. But it would be > > really horrible IMO. Like > > > > errmsg("acquired replication slot \"%s\" of type \"%s\"", > >NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") > > > > Thanks for the suggestion. I would like to clarify on this a bit. What > do exactly mean by splitting out of the sentence? For example, in one > of the existing messages: > > ereport(LOG, > /* translator: %s is SIGKILL or SIGABRT */ > (errmsg("issuing %s to recalcitrant children", > send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); > > Do here words SIGABRT/SIGKILL remain untranslated due to the > translator's comment? I thought this was similar to the message being > proposed but seems like this message construction follows translation > rules better. > IIUC, that example is different because "SIGABRT" / "SIGKILL" are not real words, so you don't want the translator to attempt to translate them.You want them to appear in the message as-is. OTOH in this patch "logical" and "physical" are just normal English words that should be translated as part of the original message. e.g. like in these similar messages: - msgid "database \"%s\" is used by an active logical replication slot" - msgstr "la base de données « %s » est utilisée par un slot de réplication logique actif" == Kind Regards, Peter Smith. Fujitsu Australia
Re: pg_upgrade and logical replication
Here are some review comments for patch v14-0001 == src/backend/utils/adt/pg_upgrade_support.c 1. binary_upgrade_replorigin_advance + /* lock to prevent the replication origin from vanishing */ + LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock); + originid = replorigin_by_name(originname, false); Use uppercase for the lock comment. == src/bin/pg_upgrade/check.c 2. check_for_subscription_state > > + prep_status("Checking for subscription state"); > > + > > + snprintf(output_path, sizeof(output_path), "%s/%s", > > + log_opts.basedir, > > + "subscription_state.txt"); > > > > I felt this filename ought to be more like > > 'subscriptions_with_bad_state.txt' because the current name looks like > > a normal logfile with nothing to indicate that it is only for the > > states of the "bad" subscriptions. > > I have kept the file name intentionally shorted as we noticed that > when the upgrade of the publisher patch used a longer name there were > some buildfarm failures because of longer names. OK, but how about some other short meaningful name like 'subs_invalid.txt'? I also thought "state" in the original name was misleading because this file contains not only subscriptions with bad 'state' but also subscriptions with missing 'origin'. ~~~ 3. check_new_cluster_logical_replication_slots int nslots_on_old; int nslots_on_new; + int nsubs_on_old = old_cluster.subscription_count; I felt it might be better to make both these quantities 'unsigned' to make it more obvious that there are no special meanings for negative numbers. ~~~ 4. check_new_cluster_logical_replication_slots nslots_on_old = count_old_cluster_logical_slots(); ~ IMO the 'nsubs_on_old' should be coded the same as above. AFAICT, this is the only code where you are interested in the number of subscribers, and furthermore, it seems you only care about that count in the *old* cluster. This means the current implementation of get_subscription_count() seems more generic than it needs to be and that results in more unnecessary patch code. (I will repeat this same review comment in the other relevant places). SUGGESTION nslots_on_old = count_old_cluster_logical_slots(); nsubs_on_old = count_old_cluster_subscriptions(); ~~~ 5. + /* + * Quick return if there are no logical slots and subscriptions to be + * migrated. + */ + if (nslots_on_old == 0 && nsubs_on_old == 0) return; /and subscriptions/and no subscriptions/ ~~~ 6. - if (nslots_on_old > max_replication_slots) + if (nslots_on_old && nslots_on_old > max_replication_slots) pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of " "logical replication slots (%d) on the old cluster", max_replication_slots, nslots_on_old); Neither nslots_on_old nor max_replication_slots can be < 0, so I don't see why the additional check is needed here. AFAICT "if (nslots_on_old > max_replication_slots)" acheives the same thing that you want. ~~~ 7. + if (nsubs_on_old && nsubs_on_old > max_replication_slots) + pg_fatal("max_replication_slots (%d) must be greater than or equal to the number of " + "subscriptions (%d) on the old cluster", + max_replication_slots, nsubs_on_old); Neither nsubs_on_old nor max_replication_slots can be < 0, so I don't see why the additional check is needed here. AFAICT "if (nsubs_on_old > max_replication_slots)" achieves the same thing that you want. == src/bin/pg_upgrade/info.c 8. get_db_rel_and_slot_infos + if (cluster == &old_cluster) + get_subscription_count(cluster); + I felt this is unnecessary because you only want to know the nsubs_on_old in one place and then only for the old cluster, so calling this to set a generic attribute for the cluster is overkill. ~~~ 9. +/* + * Get the number of subscriptions in the old cluster. + */ +static void +get_subscription_count(ClusterInfo *cluster) +{ + PGconn*conn; + PGresult *res; + + if (GET_MAJOR_VERSION(cluster->major_version) < 1700) + return; + + conn = connectToServer(cluster, "template1"); + res = executeQueryOrDie(conn, + "SELECT oid FROM pg_catalog.pg_subscription"); + + cluster->subscription_count = PQntuples(res); + + PQclear(res); + PQfinish(conn); +} 9a. Currently, this is needed only for the old_cluster (like the function comment implies), so the parameter is not required. Also, AFAICT this number is only needed in one place (check_new_cluster_logical_replication_slots) so IMO it would be better to make lots of changes to simplify this code: - change the function name to be like the other one. e.g. count_old_cluster_subscriptions() - function to return unsigned SUGGESTION (something like this...) unsigned count_old_cluster_subscriptions(void) { unsigned nsubs = 0; if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) { PGconn *conn = connectToServer(&old_cluster, "template1"); PGresult *res = executeQueryOrDie(conn, "SELECT oid FROM pg_catalog.pg_subscription"); nsubs = PQntuples(res);
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera wrote: > > Translation-wise, this doesn't work, because you're building a string. > There's no reason to think that the words "logical" and "physical" > should stay untranslated; the message would make no sense, or at least > would be very ugly. > > You should do something like > > if (am_walsender) > { > ereport(log_replication_commands ? LOG : DEBUG1, > SlotIsLogical(s) ? errmsg("acquired logical replication slot > \"%s\"", NameStr(s->data.name)) : > errmsg("acquired physical replication slot \"%s\"", > NameStr(s->data.name))); > } > > (Obviously, lose the "translator:" comments since they are unnecessary) > > > If you really want to keep the "logical"/"physical" word untranslated, > you need to split it out of the sentence somehow. But it would be > really horrible IMO. Like > > errmsg("acquired replication slot \"%s\" of type \"%s\"", >NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") > Thanks for the suggestion. I would like to clarify on this a bit. What do exactly mean by splitting out of the sentence? For example, in one of the existing messages: ereport(LOG, /* translator: %s is SIGKILL or SIGABRT */ (errmsg("issuing %s to recalcitrant children", send_abort_for_kill ? "SIGABRT" : "SIGKILL"))); Do here words SIGABRT/SIGKILL remain untranslated due to the translator's comment? I thought this was similar to the message being proposed but seems like this message construction follows translation rules better. -- With Regards, Amit Kapila.
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Thu, Nov 16, 2023 at 3:48 AM Peter Smith wrote: > > ~ > > SUGGESTION (#1a and #1b) > > ereport(log_replication_commands ? LOG : DEBUG1, > errmsg(SlotIsLogical(s) >? "acquired logical replication slot \"%s\"" >: "acquired physical replication slot \"%s\"", >NameStr(s->data.name))); > > ~~~ > Personally, I prefer the way Bharath had in his patch. Did you see any preferred way in the existing code? -- With Regards, Amit Kapila.
Re: Remove MSVC scripts from the tree
On Wed, Nov 15, 2023 at 11:27:13AM +0100, Peter Eisentraut wrote: > (Note, however, that your rebase didn't pick up commits e7814b40d0 and > b41b1a7f49 that I did yesterday. Please check that again.) Indeed. I need to absorb that properly. -- Michael signature.asc Description: PGP signature
Re: Tab completion for CREATE TABLE ... AS
On Wed, Nov 15, 2023 at 05:26:58PM +0300, Gilles Darold wrote: > Right, I don't know how I have missed the sql-createtableas page in the > documentation. > > Patched v2 fixes the keyword list, I have also sorted by alphabetical order > the CREATE TABLE completion (AS was at the end of the list). > > It has also been re-based on current master. Fun. It has failed to apply here. Anyway, I can see that a comment update has been forgotten. A second thing is that it requires two more lines to add the query keywords for the case where a CTAS has a list of column names. I've added both changes, and applied the patch on HEAD. That's not all the patterns possible, but this covers the most useful ones. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Allow INSTEAD OF DELETE triggers to modify the tuple for RETURNING
On Fri, Nov 3, 2023 at 12:34 AM Marko Tiikkaja wrote: > > I am now. Thanks! :-) Will try to keep an eye on the builds in the future. > > Attached v4 of the patch which should fix the issue. > doc seems to still have an issue. https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F45%2F4617 In the regress test, do we need to clean up the created object after we use it. tested passed, looking at ExecIRInsertTriggers, your changes look sane.
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On Thu, Nov 9, 2023 at 4:12 AM Tom Lane wrote: > > Daniel Gustafsson writes: > >> On 8 Nov 2023, at 19:18, Tom Lane wrote: > >> I think an actually usable feature of this sort would involve > >> copying all the failed lines to some alternate output medium, > >> perhaps a second table with a TEXT column to receive the original > >> data line. (Or maybe an array of text that could receive the > >> broken-down field values?) Maybe we could dump the message info, > >> line number, field name etc into additional columns. > > > I agree that the errors should be easily visible to the user in some way. > > The > > feature is for sure interesting, especially in data warehouse type jobs > > where > > dirty data is often ingested. > > I agree it's interesting, but we need to get it right the first time. > > Here is a very straw-man-level sketch of what I think might work. > The option to COPY FROM looks something like > > ERRORS TO other_table_name (item [, item [, ...]]) > > where the "items" are keywords identifying the information item > we will insert into each successive column of the target table. > This design allows the user to decide which items are of use > to them. I envision items like > > LINENO bigint COPY line number, counting from 1 > LINEtextraw text of line (after encoding conversion) > FIELDS text[] separated, de-escaped string fields (the data > that was or would be fed to input functions) > FIELD textname of troublesome field, if field-specific > MESSAGE texterror message text > DETAIL texterror message detail, if any > SQLSTATE text error SQLSTATE code > just SAVE ERRORS automatically create a table to hold the error. (validate auto-generated table name uniqueness, validate create privilege). and the table will have the above related info. if no error then table gets dropped.
Re: Add minimal C example and SQL registration example for custom table access methods.
Suggestion: In the C example you added you mention in the comment: + /* Methods from TableAmRoutine omitted from example, but all + non-optional ones must be provided here. */ Perhaps you could provide a "see " to point the reader finding your example where he could find these non-optional methods he must provide? Nitpicking a little: your patch appears to change more lines than it does, because it added line breaks earlier in the lines. I would generally avoid that unless there's good reason to do so.
Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
Hi, On 2023-11-15 16:32:48 -0500, Robert Haas wrote: > On Mon, Nov 13, 2023 at 8:26 PM Andres Freund wrote: > > I think this undersells the situation a bit. We right now do > > FreeSpaceMapVacuumRange() for 8GB of data (VACUUM_FSM_EVERY_PAGES) in the > > main > > fork, while holding an exclusive page level lock. > > That sounds fairly horrific? It's decidedly not great, indeed. I couldn't come up with a clear risk of deadlock, but I wouldn't want to bet against there being a deadlock risk. I think the rarity of it does ameliorate the performance issues to some degree. Thoughts on whether to backpatch? It'd probably be at least a bit painful, there have been a lot of changes in the surrounding code in the last 5 years. - Andres
Re: WaitEventSet resource leakage
Heikki Linnakangas writes: > On 09/03/2023 20:51, Tom Lane wrote: >> After further thought that seems like a pretty ad-hoc solution. >> We probably can do no better in the back branches, but shouldn't >> we start treating WaitEventSets as ResourceOwner-managed resources? >> Otherwise, transient WaitEventSets are going to be a permanent >> source of headaches. > Let's change it so that it's always allocated in TopMemoryContext, but > pass a ResourceOwner instead: > WaitEventSet * > CreateWaitEventSet(ResourceOwner owner, int nevents) > And use owner == NULL to mean session lifetime. WFM. (I didn't study your back-branch patch.) regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > Idea #1 > For output, which happens with sprintf(ptr, "%.15g%s", ...) in > execute.c, perhaps we could use our in-tree Ryu routine instead? > For input, which happens with strtod() in data.c, rats, we don't have > a parser and I understand that it is not for the faint of heart Yeah. Getting rid of ecpg's use of uselocale() would certainly be nice, but I'm not ready to add our own implementation of strtod() to get there. > Idea #2 > Perhaps we could use snprintf_l() and strtod_l() where available. > They're not standard, but they are obvious extensions that NetBSD and > Windows have, and those are the two systems for which we are doing > grotty things in that code. Oooh, shiny. I do not see any man page for strtod_l, but I do see that it's declared on mamba's host. I wonder how long they've had it? The man page for snprintf_l appears to be quite ancient, so we could hope that strtod_l is available on all versions anyone cares about. > That would amount to extending > pg_locale.c's philosophy: either you must have uselocale(), or the > full set of _l() functions (that POSIX failed to standardise, dunno > what the history is behind that, seems weird). Yeah. I'd say the _l functions should be preferred over uselocale() if available, but sadly they're not there on common systems. (It looks like glibc has strtod_l but not snprintf_l, which is odd.) regards, tom lane
Re: WaitEventSet resource leakage
(Alexander just reminded me of this off-list) On 09/03/2023 20:51, Tom Lane wrote: In [1] I wrote: PG Bug reporting form writes: The following script: [ leaks a file descriptor per error ] Yeah, at least on platforms where WaitEventSets own kernel file descriptors. I don't think it's postgres_fdw's fault though, but that of ExecAppendAsyncEventWait, which is ignoring the possibility of failing partway through. It looks like it'd be sufficient to add a PG_CATCH or PG_FINALLY block there to make sure the WaitEventSet is disposed of properly --- fortunately, it doesn't need to have any longer lifespan than that one function. Here's a patch to do that. For back branches. After further thought that seems like a pretty ad-hoc solution. We probably can do no better in the back branches, but shouldn't we start treating WaitEventSets as ResourceOwner-managed resources? Otherwise, transient WaitEventSets are going to be a permanent source of headaches. Agreed. The current signature of CurrentWaitEventSet is: WaitEventSet * CreateWaitEventSet(MemoryContext context, int nevents) Passing MemoryContext makes little sense when the WaitEventSet also holds file descriptors. With anything other than TopMemoryContext, you need to arrange for proper cleanup with PG_TRY-PG_CATCH or by avoiding ereport() calls. And once you've arrange for cleanup, the memory context doesn't matter much anymore. Let's change it so that it's always allocated in TopMemoryContext, but pass a ResourceOwner instead: WaitEventSet * CreateWaitEventSet(ResourceOwner owner, int nevents) And use owner == NULL to mean session lifetime. -- Heikki Linnakangas Neon (https://neon.tech) From b9ea609855b838369cddb33e4045ac91603dd726 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 15 Nov 2023 23:44:56 +0100 Subject: [PATCH v1 1/1] Fix resource leak when a FDW's ForeignAsyncRequest function fails If an error is thrown after calling CreateWaitEventSet(), the memory of a WaitEventSet is free'd as it's allocated in the short-lived memory context, but the file descriptor (on epoll- or kqueue-based systems) or handles (on Windows) that it contains are leaked. Use PG_TRY-FINALLY to ensure it gets freed. In the passing, fix misleading comment on what the 'nevents' argument to WaitEventSetWait means. The added test doesn't check for leaking resources, so it passed even before this commit. But at least it covers the code path. Report by Alexander Lakhin, analysis and suggestion for the fix by Tom Lane. Discussion: https://www.postgresql.org/message-id/17828-122da8cba2323...@postgresql.org Discussion: https://www.postgresql.org/message-id/472235.1678387...@sss.pgh.pa.us --- .../postgres_fdw/expected/postgres_fdw.out| 7 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 6 ++ src/backend/executor/nodeAppend.c | 66 +++ 3 files changed, 50 insertions(+), 29 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 64bcc66b8d..22cae37a1e 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -10809,6 +10809,13 @@ SELECT * FROM result_tbl ORDER BY a; (2 rows) DELETE FROM result_tbl; +-- Test error handling, if accessing one of the foreign partitions errors out +CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (1) TO (10001) + SERVER loopback OPTIONS (table_name 'non_existent_table'); +SELECT * FROM async_pt; +ERROR: relation "public.non_existent_table" does not exist +CONTEXT: remote SQL command: SELECT a, b, c FROM public.non_existent_table +DROP FOREIGN TABLE async_p_broken; -- Check case where multiple partitions use the same connection CREATE TABLE base_tbl3 (a int, b int, c text); CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000) diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 2d14eeadb5..075da4ff86 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -3607,6 +3607,12 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505; SELECT * FROM result_tbl ORDER BY a; DELETE FROM result_tbl; +-- Test error handling, if accessing one of the foreign partitions errors out +CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (1) TO (10001) + SERVER loopback OPTIONS (table_name 'non_existent_table'); +SELECT * FROM async_pt; +DROP FOREIGN TABLE async_p_broken; + -- Check case where multiple partitions use the same connection CREATE TABLE base_tbl3 (a int, b int, c text); CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000) diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index 609df6b9e6..99818d3ebc 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/execut
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 10:17 AM Tom Lane wrote: > Thomas Munro writes: > > The other uses of uselocale() are in ECPG code that must > > be falling back to the setlocale() path. In other words, isn't it the > > case that we don't require uselocale() to compile ECPG stuff, but it'll > > probably crash or corrupt itself or give wrong answers if you push it > > on NetBSD, so... uhh, really we do require it? > > Dunno. mamba is getting through the ecpg regression tests okay, > but we all know that doesn't prove a lot. (AFAICS, ecpg only > cares about this to the extent of not wanting an LC_NUMERIC > locale where the decimal point isn't '.'. I'm not sure that > NetBSD supports any such locale anyway --- I think they're like > OpenBSD in having only pro-forma locale support.) Idea #1 For output, which happens with sprintf(ptr, "%.15g%s", ...) in execute.c, perhaps we could use our in-tree Ryu routine instead? For input, which happens with strtod() in data.c, rats, we don't have a parser and I understand that it is not for the faint of heart (naive implementation gets subtle things wrong, cf "How to read floating point numbers accurately" by W D Clinger + whatever improvements have happened in this space since 1990). Idea #2 Perhaps we could use snprintf_l() and strtod_l() where available. They're not standard, but they are obvious extensions that NetBSD and Windows have, and those are the two systems for which we are doing grotty things in that code. That would amount to extending pg_locale.c's philosophy: either you must have uselocale(), or the full set of _l() functions (that POSIX failed to standardise, dunno what the history is behind that, seems weird).
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Some minor comments for v17-0001. == 1. + ereport(log_replication_commands ? LOG : DEBUG1, + SlotIsLogical(s) + /* translator: %s is name of the replication slot */ + ? errmsg("acquired logical replication slot \"%s\"", +NameStr(s->data.name)) + : errmsg("acquired physical replication slot \"%s\"", +NameStr(s->data.name))); 1a. FWIW, if the ternary was inside the errmsg, there would be less code duplication. ~ 1b. I searched HEAD code and did not find any "translator:" comments for just ordinary slot name substitutions like these; AFAICT that comment is not necessary anymore. ~ SUGGESTION (#1a and #1b) ereport(log_replication_commands ? LOG : DEBUG1, errmsg(SlotIsLogical(s) ? "acquired logical replication slot \"%s\"" : "acquired physical replication slot \"%s\"", NameStr(s->data.name))); ~~~ 2. Ditto for the other ereport. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Popcount optimization using AVX512
On Wed, Nov 15, 2023 at 08:27:57PM +, Shankaran, Akash wrote: > AVX512 has light and heavy instructions. While the heavy AVX512 > instructions have clock frequency implications, the light instructions > not so much. See [0] for more details. We captured EMON data for the > benchmark used in this work, and see that the instructions are using the > licensing level not meant for heavy AVX512 operations. This means the > instructions for popcount : _mm512_popcnt_epi64(), > _mm512_reduce_add_epi64() are not going to have any significant impact on > CPU clock frequency. > > Clock frequency impact aside, we measured the same benchmark for gains on > older Intel hardware and observe up to 18% better performance on Intel > Icelake. On older intel hardware, the popcntdq 512 instruction is not > present so it won’t work. If clock frequency is not affected, rest of > workload should not be impacted in the case of mixed workloads. Thanks for sharing your analysis. > Testing this on smaller block sizes < 8KiB shows that AVX512 compared to > the current 64bit behavior shows slightly lower performance, but with a > large variance. We cannot conclude much from it. The testing with ANALYZE > benchmark by Nathan also points to no visible impact as a result of using > AVX512. The gains on larger dataset is easily evident, with less > variance. > > What are your thoughts if we introduce AVX512 popcount for smaller sizes > as an optional feature initially, and then test it more thoroughly over > time on this particular use case? I don't see any need to rush this. At the very earliest, this feature would go into v17, which doesn't enter feature freeze until April 2024. That seems like enough time to complete any additional testing you'd like to do. However, if you are seeing worse performance with this patch, then it seems unlikely that we'd want to proceed. > Thoughts or feedback on the approach in the patch? This solution should > not impact anyone who doesn’t use the feature i.e. AVX512. Open to > additional ideas if this doesn’t seem like the right approach here. It's true that it wouldn't impact anyone not using the feature, but there's also a decent chance that this code goes virtually untested. As I've stated elsewhere [0], I think we should ensure there's buildfarm coverage for this kind of architecture-specific stuff. [0] https://postgr.es/m/20230726043707.GB3211130%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
On Mon, Nov 13, 2023 at 8:26 PM Andres Freund wrote: > I think this undersells the situation a bit. We right now do > FreeSpaceMapVacuumRange() for 8GB of data (VACUUM_FSM_EVERY_PAGES) in the main > fork, while holding an exclusive page level lock. That sounds fairly horrific? -- Robert Haas EDB: http://www.enterprisedb.com
Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
On Tue, Nov 14, 2023 at 7:15 PM Andres Freund wrote: > > Hi, > > On 2023-11-14 07:46:10 -0500, Melanie Plageman wrote: > > > FreeSpaceMapVacuumRange()'s comment says: > > > * As above, but assume that only heap pages between start and end-1 > > > inclusive > > > * have new free-space information, so update only the upper-level slots > > > * covering that block range. end == InvalidBlockNumber is equivalent to > > > * "all the rest of the relation". > > > > > > So FreeSpaceMapVacuumRange(..., blkno) will not actually process the > > > "effects" > > > of the RecordPageWithFreeSpace() above it - which seems confusing. > > > > Ah, so shall I pass blkno + 1 as end? > > I think there's no actual overflow danger, because MaxBlockNumber + 1 is > InvalidBlockNumber, which scans the rest of the relation (i.e. exactly the > intended block). Perhaps worth noting? Attached > > > =# DELETE FROM copytest_0; > > > =# VACUUM (VERBOSE) copytest_0; > > > ... > > > INFO: 0: table "copytest_0": truncated 146264 to 110934 pages > > > ... > > > tuples missed: 5848 dead from 89 pages not removed due to cleanup lock > > > contention > > > ... > > > > > > A bit of debugging later I figured out that this is due to the background > > > writer. If I SIGSTOP bgwriter, the whole relation is truncated... > > > > That's a bit sad. But isn't that what you would expect bgwriter to do? > > Write out dirty buffers? It doesn't know that those pages consist of > > only dead tuples and that you will soon truncate them. > > I think the issue more that it's feels wrong that a pin by bgwriter blocks > vacuum cleaning up. I think the same happens with on-access pruning - where > it's more likely for bgwriter to focus on those pages. Checkpointer likely > causes the same. Even normal backends can cause this while writing out the > page. > > It's not like bgwriter/checkpointer would actually have a problem if the page > were pruned - the page is locked while being written out and neither keep > pointers into the page after unlocking it again. > > > At this point I started to wonder if we should invent a separate type of pin > for a buffer undergoing IO. We basically have the necessary state already: > BM_IO_IN_PROGRESS. We'd need to look at that in some places, e.g. in > InvalidateBuffer(), instead of BUF_STATE_GET_REFCOUNT(), we'd need to also > look at BM_IO_IN_PROGRESS. > > > Except of course that that'd not change anything - > ConditionalLockBufferForCleanup() locks the buffer conditionally, *before* > even looking at the refcount and returns false if not. And writing out a > buffer takes a content lock. Which made me realize that > "tuples missed: 5848 dead from 89 pages not removed due to cleanup lock > contention" > > is often kinda wrong - the contention is *not* cleanup lock specific. It's > often just plain contention on the lwlock. Do you think we should change the error message? > Perhaps we should just treat IO_IN_PROGRESS buffers differently in > lazy_scan_heap() and heap_page_prune_opt() and wait? Hmm. This is an interesting idea. lazy_scan_heap() waiting for checkpointer to write out a buffer could have an interesting property of shifting who writes out the FPI. I wonder if it would matter. - Melanie From b1af221d22cfcfbc63497d45c2d0f8ab28c720ab Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Mon, 13 Nov 2023 16:39:25 -0500 Subject: [PATCH v2] Release lock on heap buffer before vacuuming FSM When there are no indexes on a table, we vacuum each heap block after pruning it and then update the freespace map. Periodically, we also vacuum the freespace map. This was done while unnecessarily holding a lock on the heap page. Release the lock before calling FreeSpaceMapVacuumRange() and, while we're at it, ensure the range includes the heap block we just vacuumed. Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 30 +--- 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 6985d299b2..59f51f40e1 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1046,18 +1046,6 @@ lazy_scan_heap(LVRelState *vacrel) /* Forget the LP_DEAD items that we just vacuumed */ dead_items->num_items = 0; -/* - * Periodically perform FSM vacuuming to make newly-freed - * space visible on upper FSM pages. Note we have not yet - * performed FSM processing for blkno. - */ -if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES) -{ - FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, - blkno); - next_fsm_block_to_vacuum = blkno; -} - /* * Now perform FSM processing for blkno, and move on to next * page. @@ -1071,6 +1059,24 @@ lazy_scan_heap(LVRelState *vacrel) UnlockReleaseBuffer(b
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
On 15.11.23 13:26, Amul Sul wrote: Question: Why are you using AT_PASS_ADD_OTHERCONSTR? I don't know if it's right or wrong, but if you have a specific reason, it would be good to know. I referred to ALTER COLUMN DEFAULT and used that. Hmm, I'm not sure if that is a good comparison. For ALTER TABLE, SET DEFAULT is just a catalog manipulation, it doesn't change any data, so it's pretty easy. SET EXPRESSION changes data, which other phases might want to inspect? For example, if you do SET EXPRESSION and add a constraint in the same ALTER TABLE statement, do those run in the correct order? I think ATExecSetExpression() needs to lock pg_attribute? Did you lose that during the refactoring? I have removed that intentionally since we were not updating anything in pg_attribute like ALTER DROP EXPRESSION. ok Tiny comment: The error message in ATExecSetExpression() does not need to mention "stored", since it would be also applicable to virtual generated columns in the future. I had to have the same thought, but later decided when we do that virtual column thing, we could simply change that. I am fine to do that change now as well, let me know your thought. Not a big deal, but I would change it now. Another small thing I found: In ATExecColumnDefault(), there is an errhint() that suggests DROP EXPRESSION instead of DROP DEFAULT. You could now add another hint that suggests SET EXPRESSION instead of SET DEFAULT.
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > Currently pg_locale.c requires systems to have *either* uselocale() or > mbstowcs_l()/wcstombs_l(), but NetBSD satisfies the second > requirement. Check. > The other uses of uselocale() are in ECPG code that must > be falling back to the setlocale() path. In other words, isn't it the > case that we don't require uselocale() to compile ECPG stuff, but it'll > probably crash or corrupt itself or give wrong answers if you push it > on NetBSD, so... uhh, really we do require it? Dunno. mamba is getting through the ecpg regression tests okay, but we all know that doesn't prove a lot. (AFAICS, ecpg only cares about this to the extent of not wanting an LC_NUMERIC locale where the decimal point isn't '.'. I'm not sure that NetBSD supports any such locale anyway --- I think they're like OpenBSD in having only pro-forma locale support.) regards, tom lane
Re: typo in fallback implementation for pg_atomic_test_set_flag()
On Wed, Nov 15, 2023 at 09:52:34AM -0600, Nathan Bossart wrote: > On Tue, Nov 14, 2023 at 07:17:32PM -0800, Andres Freund wrote: >> Are you planning to apply the fix? > > Yes, I'll take care of it. Committed and back-patched. I probably could've skipped back-patching this one since it doesn't seem to be causing any problems yet, but I didn't see any reason not to back-patch, either. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 9:51 AM Tom Lane wrote: > Thomas Munro writes: > > On Thu, Nov 16, 2023 at 6:45 AM Tom Lane wrote: > >> You would need to do some research and try to prove that that won't > >> be a problem on any modern platform. Presumably it once was a problem, > >> or we'd not have bothered with a configure check. > > > According to data I scraped from the build farm, the last two systems > > we had that didn't have uselocale() were curculio (OpenBSD 5.9) and > > wrasse (Solaris 11.3), but those were both shut down (though wrasse > > still runs old branches) as they were well out of support. > > AFAICS, NetBSD still doesn't have it. They have no on-line man page > for it, and my animal mamba shows it as not found. Oh :-( I see that but had missed that sidewinder was NetBSD and my scraped data predates mamba. Sorry for the wrong info. Currently pg_locale.c requires systems to have *either* uselocale() or mbstowcs_l()/wcstombs_l(), but NetBSD satisfies the second requirement. The other uses of uselocale() are in ECPG code that must be falling back to the setlocale() path. In other words, isn't it the case that we don't require uselocale() to compile ECPG stuff, but it'll probably crash or corrupt itself or give wrong answers if you push it on NetBSD, so... uhh, really we do require it?
Re: On non-Windows, hard depend on uselocale(3)
Thomas Munro writes: > On Thu, Nov 16, 2023 at 6:45 AM Tom Lane wrote: >> You would need to do some research and try to prove that that won't >> be a problem on any modern platform. Presumably it once was a problem, >> or we'd not have bothered with a configure check. > According to data I scraped from the build farm, the last two systems > we had that didn't have uselocale() were curculio (OpenBSD 5.9) and > wrasse (Solaris 11.3), but those were both shut down (though wrasse > still runs old branches) as they were well out of support. AFAICS, NetBSD still doesn't have it. They have no on-line man page for it, and my animal mamba shows it as not found. regards, tom lane
Re: Allow tests to pass in OpenSSL FIPS mode
Daniel Gustafsson writes: > Since the 3DES/DES deprecations aren't limited to FIPS, do we want to do > anything for pgcrypto where we have DES/3DES encryption? Maybe a doc patch > which mentions the deprecation with a link to the SP could be in order? A docs patch that marks both MD5 and 3DES as deprecated is probably appropriate, but it seems like a matter for a separate thread and patch. In the meantime, I've done a pass of review of Peter's v4 patches. v4-0001 is already committed, so that's not considered here. v4-0002: I think it is worth splitting up contrib/pgcrypto's pgp-encrypt test, which has only one test case whose output changes, and a bunch of others that don't. v5-0002, attached, does it like that. It's otherwise the same as v4. (It might be worth doing something similar for uuid_ossp's test, but I have not bothered here. That test script is stable enough that I'm not too worried about future maintenance.) The attached 0003, 0004, 0005 patches are identical to Peter's. I think that it is possibly worth modifying the password test so that we don't fail to create the roles, so as to reduce the delta between password.out and password_1.out (and thereby ease future maintenance of those files). However you might disagree, so I split my proposal out as a separate patch v5-0007-password-test-delta.patch; you can drop that from the set if you don't like it. v5-0006-allow-for-disabled-3DES.patch adds the necessary expected file to make that pass on my Fedora 38 system. With or without 0007, as you choose, I think it's committable. regards, tom lane diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile index 7fb59f51b7..5efa10c334 100644 --- a/contrib/pgcrypto/Makefile +++ b/contrib/pgcrypto/Makefile @@ -42,7 +42,7 @@ PGFILEDESC = "pgcrypto - cryptographic functions" REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ sha2 des 3des cast5 \ crypt-des crypt-md5 crypt-blowfish crypt-xdes \ - pgp-armor pgp-decrypt pgp-encrypt $(CF_PGP_TESTS) \ + pgp-armor pgp-decrypt pgp-encrypt pgp-encrypt-md5 $(CF_PGP_TESTS) \ pgp-pubkey-decrypt pgp-pubkey-encrypt pgp-info EXTRA_CLEAN = gen-rtab diff --git a/contrib/pgcrypto/expected/crypt-md5_1.out b/contrib/pgcrypto/expected/crypt-md5_1.out new file mode 100644 index 00..0ffda34ab4 --- /dev/null +++ b/contrib/pgcrypto/expected/crypt-md5_1.out @@ -0,0 +1,16 @@ +-- +-- crypt() and gen_salt(): md5 +-- +SELECT crypt('', '$1$Szzz0yzz'); +ERROR: crypt(3) returned NULL +SELECT crypt('foox', '$1$Szzz0yzz'); +ERROR: crypt(3) returned NULL +CREATE TABLE ctest (data text, res text, salt text); +INSERT INTO ctest VALUES ('password', '', ''); +UPDATE ctest SET salt = gen_salt('md5'); +UPDATE ctest SET res = crypt(data, salt); +ERROR: crypt(3) returned NULL +SELECT res = crypt(data, res) AS "worked" +FROM ctest; +ERROR: invalid salt +DROP TABLE ctest; diff --git a/contrib/pgcrypto/expected/hmac-md5_1.out b/contrib/pgcrypto/expected/hmac-md5_1.out new file mode 100644 index 00..56875b0f63 --- /dev/null +++ b/contrib/pgcrypto/expected/hmac-md5_1.out @@ -0,0 +1,44 @@ +-- +-- HMAC-MD5 +-- +SELECT hmac( +'Hi There', +'\x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b'::bytea, +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized +-- 2 +SELECT hmac( +'Jefe', +'what do ya want for nothing?', +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized +-- 3 +SELECT hmac( +'\x'::bytea, +'\x'::bytea, +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized +-- 4 +SELECT hmac( +'\xcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcd'::bytea, +'\x0102030405060708090a0b0c0d0e0f10111213141516171819'::bytea, +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized +-- 5 +SELECT hmac( +'Test With Truncation', +'\x0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c'::bytea, +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized +-- 6 +SELECT hmac( +'Test Using Larger Than Block-Size Key - Hash Key First', +'\x'::bytea, +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized +-- 7 +SELECT hmac( +'Test Using Larger Than Block-Size Key and Larger Than One Block-Size Data', +'\x'::bytea, +'md5'); +ERROR: Cannot use "md5": Cipher cannot be initialized diff --git a/contrib/pgcrypto/expected/md5_1.out b/contrib/pgcrypto/expected/md5_1.out new file mode 100644 index 00..decb215c48 --- /dev/null +++ b/contrib/pgcrypto/expected/md5_1.out @@ -0,0 +1,17 @@ +-- +-- MD5 message digest +-- +SELECT digest('
Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-11-15 10:09:06 -0500, Tom Lane wrote: > "Anton A. Melnikov" writes: > > I can't understand why i get the opposite results on my pc and on the > > server. It is clear that the absolute > > TPS values will be different for various configurations. This is normal. > > But differences? > > Is it unlikely that some kind of reference configuration is needed to > > accurately > > measure the difference in performance. Probably something wrong with my pc, > > but now > > i can not figure out what's wrong. > > > Would be very grateful for any advice or comments to clarify this problem. > > Benchmarking is hard :-(. Indeed. > IME it's absolutely typical to see variations of a couple of percent even > when "nothing has changed", for example after modifying some code that's > nowhere near any hot code path for the test case. I usually attribute this > to cache effects, such as a couple of bits of hot code now sharing or not > sharing a cache line. FWIW, I think we're overusing that explanation in our community. Of course you can encounter things like this, but the replacement policies of cpu caches have gotten a lot better and the caches have gotten bigger too. IME this kind of thing is typically dwarfed by much bigger variations from things like - cpu scheduling - whether the relevant pgbench thread is colocated on the same core as the relevant backend can make a huge difference, particularly when CPU power saving modes are not disabled. Just looking at tps from a fully cached readonly pgbench, with a single client: Power savings enabled, same core: 37493 Power savings enabled, different core: 28539 Power savings disabled, same core: 38167 Power savings disabled, different core: 37365 - can transparent huge pages be used for the executable mapping, or not On newer kernels linux (and some filesystems) can use huge pages for the executable. To what degree that succeeds is a large factor in performance. Single threaded read-only pgbench postgres mapped without huge pages: 37155 TPS with 2MB of postgres as huge pages: 37695 TPS with 6MB of postgres as huge pages: 42733 TPS The really annoying thing about this is that entirely unpredictable whether huge pages are used or not. Building the same way, sometimes 0, sometimes 2MB, sometimes 6MB are mapped huge. Even though the on-disk contents are precisely the same. And it can even change without rebuilding, if the binary is evicted from the page cache. This alone makes benchmarking extremely annoying. It basically can't be controlled and has huge effects. - How long has the server been started If e.g. once you run your benchmark on the first connection to a database, and after a restart not (e.g. autovacuum starts up beforehand), you can get a fairly different memory layout and cache situation, due to [not] using the relcache init file. If not, you'll have a catcache that's populated, otherwise not. Another mean one is whether you start your benchmark within a relatively short time of the server starting. Readonly pgbench with a single client, started immediately after the server: progress: 12.0 s, 37784.4 tps, lat 0.026 ms stddev 0.001, 0 failed progress: 13.0 s, 37779.6 tps, lat 0.026 ms stddev 0.001, 0 failed progress: 14.0 s, 37668.2 tps, lat 0.026 ms stddev 0.001, 0 failed progress: 15.0 s, 32133.0 tps, lat 0.031 ms stddev 0.113, 0 failed progress: 16.0 s, 37564.9 tps, lat 0.027 ms stddev 0.012, 0 failed progress: 17.0 s, 37731.7 tps, lat 0.026 ms stddev 0.001, 0 failed There's a dip at 15s, odd - turns out that's due to bgwriter writing a WAL record, which triggers walwriter to write it out and then initialize the whole WAL buffers with 0s - happens once. In this case I've exagerated the effect a bit by using a 1GB wal_buffers, but it's visible otherwise too. Whether your benchmark period includes that dip or not adds a fair bit of noise. You can even see the effects of autovacuum workers launching - even if there's nothing to do! Not as a huge dip, but enough to add some "run to run" variation. - How much other dirty data is there in the kernel pagecache. If you e.g. just built a new binary, even with just minor changes, the kernel will need to flush those pages eventually, which may contend for IO and increases page faults. Rebuilding an optimized build generates something like 1GB of dirty data. Particularly with ccache, that'll typically not yet be flushed by the time you run a benchmark. That's not nothing, even with a decent NVMe SSD. - many more, unfortunately Greetings, Andres Freund
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Thu, Nov 9, 2023 at 5:43 PM Andrey Chudnovsky wrote: > Do you plan to support adding an extension hook to validate the token? > > It would allow a more efficient integration, then spinning a separate process. I think an API in the style of archive modules might probably be a good way to go, yeah. It's probably not very high on the list of priorities, though, since the inputs and outputs are going to "look" the same whether you're inside or outside of the server process. The client side is going to need the bulk of the work/testing/validation. Speaking of which -- how is the current PQauthDataHook design doing when paired with MS AAD (er, Entra now I guess)? I haven't had an Azure test bed available for a while. Thanks, --Jacob
Re: pg_dump needs SELECT privileges on irrelevant extension table
> commit a70f2a57f233244c0a780829baf48c624187d456 > Author: Tom Lane > Date: Mon Nov 13 17:04:10 2023 -0500 > >Don't try to dump RLS policies or security labels for extension objects. (Thanks Tom!) --Jacob
Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro
Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 7:42 AM Dagfinn Ilmari Mannsåker wrote: > Tom Lane writes: > > > "Tristan Partin" writes: > >> I would like to propose removing HAVE_USELOCALE, and just have WIN32, > >> which means that Postgres would require uselocale(3) on anything that > >> isn't WIN32. > > > > You would need to do some research and try to prove that that won't > > be a problem on any modern platform. Presumably it once was a problem, > > or we'd not have bothered with a configure check. > > > > (Some git archaeology might yield useful info about when and why > > we added the check.) > > For reference, the Perl effort to use the POSIX.1-2008 thread-safe > locale APIs have revealed several platform-specific bugs that cause it > to disable them on FreeBSD and macOS: > > https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35 Interesting that C vs C.UTF-8 has come up there, something that has also confused us and others (in fact I still owe Daniel Vérité a response to his complaint about how we treat the latter; I got stuck on a logical problem with the proposal and then dumped core...). The idea of C.UTF-8 is relatively new, and seems to have shaken a few bugs out in a few places. Anyway, that in particular is a brand new FreeBSD bug report and I am sure it will be addressed soon. > https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831 As for macOS, one thing I noticed is that the FreeBSD -> macOS pipeline appears to have re-awoken after many years of slumber. I don't know anything about that other than that when I recently upgraded my Mac to 14.1, suddenly a few userspace tools are now running the recentish FreeBSD versions of certain userland tools (tar, grep, ...?), instead of something from the Jurassic. Whether that might apply to libc, who can say... they seemed to have quite ancient BSD locale code last time I checked. > https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862 That linked issue appears to be fixed already. > But Perl actually makes use of per-thread locales, because it has a > separate interpereer per thread, each of which can have a different > locale active. Since Postgres isn't actually multi-threaded (yet), > these concerns might not apply to the same degree. ECPG might use them in multi-threaded code. I'm not sure if it's a problem and whose problem it is.
Re: Why do indexes and sorts use the database collation?
On Tue, 2023-11-14 at 13:01 +0100, Tomas Vondra wrote: > Presumably we'd no generate incorrect > results, but we'd not be able use an index, causing performance > issues. Couldn't use the index for its pathkeys or range scans, but could use it for equality. > AFAICS this is a trade-off between known benefits (faster equality > searches, which are common for PK columns) vs. unknown downsides > (performance penalty for operations with unknown frequency). Don't forget the dependency versioning risk: changes to the collation provider library can corrupt your indexes. That affects even small tables where performance is not a concern. > Not sure it's a decision we can make automatically. But it's mostly > achievable manually, if the user specifies COLLATE "C" for the > column. Changing the column collation is the wrong place to do it, in my opinion. It conflates semantics with performance considerations and dependency risks. We already have the catalog support for indexes with a different collation: CREATE TABLE foo (t TEXT COLLATE "en_US"); INSERT INTO foo SELECT g::TEXT FROM generate_series(1,100) g; CREATE INDEX foo_idx ON foo (t COLLATE "C"); ANALYZE foo; The problem is that: EXPLAIN SELECT * FROM foo WHERE t = '345678'; doesn't use the index. And also that there's no way to do it for a PK index. > I realize you propose to do this automatically for everyone, I don't think I proposed that. Perhaps we nudge users in that direction over time as the utility becomes clear, but I'm not trying to push for a sudden radical change. Perhaps many read $SUBJECT as a rhetorical question, but I really do want to know if I am missing important and common use cases for indexes in a non-"C" collation. > But maybe there's a way > to make this manual approach more convenient? Say, by allowing the PK > to > have a different collation (which I don't think is allowed now). Yeah, that should be fairly non-controversial. > FWIW I wonder what the impact of doing this automatically would be in > practice. I mean, in my experience the number of tables with TEXT (or > types sensitive to collations) primary keys is fairly low, especially > for tables of non-trivial size (where the performance impact might be > measurable). I think a lot more users would be helped than hurt. But in absolute numbers, the latter group still represents a lot of regressions, so let's not do anything radical. > > > True. What about trying to allow a separate collation for the PK > constraint (and the backing index)? +1. > > > OK. I personally don't recall any case where I'd see a collation on > PK > indexes as a performance issue. Or maybe I just didn't realize it. Try a simple experiment building an index with the "C" collation and then try with a different locale on the same data. Numbers vary, but I've seen 1.5X to 4X on some simple generated data. Others have reported much worse numbers on some versions of glibc that are especially slow with lots of non-latin characters. > But speaking of natural keys, I recall a couple schemas with natural > keys in code/dimension tables, and it's not uncommon to cluster those > slow-moving tables once in a while. I don't know if ORDER BY queries > were very common on those tables, though. Yeah, not exactly a common case though. > > > OK, I think this answers my earlier question. Now that I think about > this, the one confusing thing with this syntax is that it seems to > assign the collation to the constraint, but in reality we want the > constraint to be enforced with the column's collation and the > alternative collation is for the index. Yeah, let's be careful about that. It's still technically correct: uniqueness in either collation makes sense. But it could be confusing anyway. > > Regards, Jeff Davis
Re: On non-Windows, hard depend on uselocale(3)
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Tom Lane writes: >> "Tristan Partin" writes: >>> I would like to propose removing HAVE_USELOCALE, and just have WIN32, >>> which means that Postgres would require uselocale(3) on anything that >>> isn't WIN32. >> You would need to do some research and try to prove that that won't >> be a problem on any modern platform. Presumably it once was a problem, >> or we'd not have bothered with a configure check. > For reference, the Perl effort to use the POSIX.1-2008 thread-safe > locale APIs have revealed several platform-specific bugs that cause it > to disable them on FreeBSD and macOS: > https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35 > https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831 > and work around bugs on others (e.g. OpenBSD): > https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862 > But Perl actually makes use of per-thread locales, because it has a > separate interpereer per thread, each of which can have a different > locale active. Since Postgres isn't actually multi-threaded (yet), > these concerns might not apply to the same degree. Interesting. That need not stop us from dropping the configure check for uselocale(), but it might be a problem for Tristan's larger ambitions. regards, tom lane
Re: On non-Windows, hard depend on uselocale(3)
Tom Lane writes: > "Tristan Partin" writes: >> I would like to propose removing HAVE_USELOCALE, and just have WIN32, >> which means that Postgres would require uselocale(3) on anything that >> isn't WIN32. > > You would need to do some research and try to prove that that won't > be a problem on any modern platform. Presumably it once was a problem, > or we'd not have bothered with a configure check. > > (Some git archaeology might yield useful info about when and why > we added the check.) For reference, the Perl effort to use the POSIX.1-2008 thread-safe locale APIs have revealed several platform-specific bugs that cause it to disable them on FreeBSD and macOS: https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35 https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831 and work around bugs on others (e.g. OpenBSD): https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862 But Perl actually makes use of per-thread locales, because it has a separate interpereer per thread, each of which can have a different locale active. Since Postgres isn't actually multi-threaded (yet), these concerns might not apply to the same degree. > regards, tom lane - ilmari
Re: On non-Windows, hard depend on uselocale(3)
On Thu, Nov 16, 2023 at 6:45 AM Tom Lane wrote: > "Tristan Partin" writes: > > I would like to propose removing HAVE_USELOCALE, and just have WIN32, > > which means that Postgres would require uselocale(3) on anything that > > isn't WIN32. > > You would need to do some research and try to prove that that won't > be a problem on any modern platform. Presumably it once was a problem, > or we'd not have bothered with a configure check. > > (Some git archaeology might yield useful info about when and why > we added the check.) According to data I scraped from the build farm, the last two systems we had that didn't have uselocale() were curculio (OpenBSD 5.9) and wrasse (Solaris 11.3), but those were both shut down (though wrasse still runs old branches) as they were well out of support. OpenBSD gained uselocale() in 6.2, and Solaris in 11.4, as part of the same suite of POSIX changes that we already required in commit 8d9a9f03. +1 for the change. https://man.openbsd.org/uselocale.3 https://docs.oracle.com/cd/E88353_01/html/E37843/uselocale-3c.html
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera wrote: > > Translation-wise, this doesn't work, because you're building a string. > There's no reason to think that the words "logical" and "physical" > should stay untranslated; the message would make no sense, or at least > would be very ugly. > > You should do something like > > if (am_walsender) > { > ereport(log_replication_commands ? LOG : DEBUG1, > SlotIsLogical(s) ? errmsg("acquired logical replication slot > \"%s\"", NameStr(s->data.name)) : > errmsg("acquired physical replication slot \"%s\"", > NameStr(s->data.name))); > } This seems better, so done that way. > (Obviously, lose the "translator:" comments since they are unnecessary) The translator message now indicates that the remaining %s denotes the replication slot name. PSA v17 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v17-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: pg_upgrade and logical replication
On Mon, 13 Nov 2023 at 17:49, Amit Kapila wrote: > > On Mon, Nov 13, 2023 at 5:01 PM Amit Kapila wrote: > > > > On Fri, Nov 10, 2023 at 7:26 PM vignesh C wrote: > > > > > > Thanks for the comments, the attached v13 version patch has the > > > changes for the same. > > > > > > > + > > + ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname, > > sizeof(originname)); > > + originid = replorigin_by_name(originname, false); > > + replorigin_advance(originid, sublsn, InvalidXLogRecPtr, > > +false /* backward */ , > > +false /* WAL log */ ); > > > > This seems to update the origin state only in memory. Is it sufficient > > to use this here? > > > > I think it is probably getting ensured by clean shutdown > (shutdown_checkpoint) which happens on the new cluster after calling > this function. We can probably try to add a comment for it. BTW, we > also need to ensure that max_replication_slots is configured to a > value higher than origins we are planning to create on the new > cluster. Added comments and also added the check for max_replication_slots. The attached v14 patch at [1] has the changes for the same. [1] - https://www.postgresql.org/message-id/CALDaNm20%3DBk_w9jDZXEqkJ3_NUAxOBswCn4jR-tmh-MqNpPZYw%40mail.gmail.com Regards, Vignesh
Re: pg_upgrade and logical replication
On Mon, 13 Nov 2023 at 17:02, Amit Kapila wrote: > > On Fri, Nov 10, 2023 at 7:26 PM vignesh C wrote: > > > > Thanks for the comments, the attached v13 version patch has the > > changes for the same. > > > > + > + ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname, > sizeof(originname)); > + originid = replorigin_by_name(originname, false); > + replorigin_advance(originid, sublsn, InvalidXLogRecPtr, > +false /* backward */ , > +false /* WAL log */ ); > > This seems to update the origin state only in memory. Is it sufficient > to use this here? Anyway, I think using this requires us to first > acquire RowExclusiveLock on pg_replication_origin something the patch > is doing for some other system table. Added the lock. The attached v14 patch at [1] has the changes for the same. [1] - https://www.postgresql.org/message-id/CALDaNm20%3DBk_w9jDZXEqkJ3_NUAxOBswCn4jR-tmh-MqNpPZYw%40mail.gmail.com Regards, Vignesh
Re: Some performance degradation in REL_16 vs REL_15
Hi, On 2023-11-15 11:33:44 +0300, Anton A. Melnikov wrote: > The configure options and test scripts on my pc and server were the same: > export CFLAGS="-O2" > ./configure --enable-debug --with-perl --with-icu --enable-depend > --enable-tap-tests > #reinstall > #reinitdb > #create database bench > for ((i=0; i<100; i++)); do pgbench -U postgres -i -s8 bench> /dev/null 2>&1; > psql -U postgres -d bench -c "checkpoint"; RES=$(pgbench -U postgres -c6 -T20 > -j6 bench; Even with scale 8 you're likely significantly impacted by contention. And obviously WAL write latency. See below for why that matters. > I can't understand why i get the opposite results on my pc and on the server. > It is clear that the absolute > TPS values will be different for various configurations. This is normal. But > differences? > Is it unlikely that some kind of reference configuration is needed to > accurately > measure the difference in performance. Probably something wrong with my pc, > but now > i can not figure out what's wrong. One very common reason for symptoms like this are power-saving measures by the CPU. In workloads where the CPU is not meaningfully utilized, the CPU will go into a powersaving mode - which can cause workloads that are latency sensitive to be badly affected. Both because initially the cpu will just work at a lower frequency and because it takes time to shift to a higher latency. Here's an example: I bound the server and psql to the same CPU core (nothing else is allowed to use that core. And ran the following: \o /dev/null SELECT 1; SELECT 1; SELECT 1; SELECT pg_sleep(0.1); SELECT 1; SELECT 1; SELECT 1; Time: 0.181 ms Time: 0.085 ms Time: 0.071 ms Time: 100.474 ms Time: 0.153 ms Time: 0.077 ms Time: 0.069 ms You can see how the first query timing was slower, the next two were faster, and then after the pg_sleep() it's slow again. # tell the CPU to optimize for performance not power cpupower frequency-set --governor performance # disable going to lower power states cpupower idle-set -D0 # disable turbo mode for consistent performance echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo Now the timings are: Time: 0.038 ms Time: 0.028 ms Time: 0.025 ms Time: 1000.262 ms (00:01.000) Time: 0.027 ms Time: 0.024 ms Time: 0.023 ms Look, fast and reasonably consistent timings. Switching back: Time: 0.155 ms Time: 0.123 ms Time: 0.074 ms Time: 1001.235 ms (00:01.001) Time: 0.120 ms Time: 0.077 ms Time: 0.068 ms The perverse thing is that this often means that *reducing* the number of instructions executed yields to *worse* behaviour when under non-sustained load, because from the CPUs point of view there is less need to increase clock speed. To show how much of a difference that can make, I ran pgbench with a single client on one core, and the server on another (so the CPU is idle inbetween): numactl --physcpubind 11 pgbench -n -M prepared -P1 -S -c 1 -T10 With power optimized configuration: latency average = 0.035 ms latency stddev = 0.002 ms initial connection time = 5.255 ms tps = 28434.334672 (without initial connection time) With performance optimized configuration: latency average = 0.025 ms latency stddev = 0.001 ms initial connection time = 3.544 ms tps = 40079.995935 (without initial connection time) That's a whopping 1.4x in throughput! Now, the same thing, except that I used a custom workload where pgbench transactions are executed in a pipelined fashion, 100 read-only transactions in one script execution: With power optimized configuration: latency average = 1.055 ms latency stddev = 0.125 ms initial connection time = 6.915 ms tps = 947.985286 (without initial connection time) (this means we actually executed 94798.5286 readonly pgbench transactions/s) With performance optimized configuration: latency average = 1.376 ms latency stddev = 0.083 ms initial connection time = 3.759 ms tps = 726.849018 (without initial connection time) Suddenly the super-duper performance optimized settings are worse (but note that stddev is down)! I suspect the problem is that now because we disabled idle states, the cpu ends up clocking *lower*, due to power usage. If I just change the relevant *cores* to the performance optimized configuration: cpupower -c 10,11 idle-set -D0; cpupower -c 10,11 frequency-set --governor performance latency average = 0.940 ms latency stddev = 0.061 ms initial connection time = 3.311 ms tps = 1063.719116 (without initial connection time) It wins again. Now, realistically you'd never use -D0 (i.e. disabling all downclocking, not just lower states) - the power differential is quite big and as shown here it can hurt performance as well. On an idle system, looking at the cpu power usage with: powerstat -D -R 5 1000 TimeUser Nice Sys IdleIO Run Ctxt/s IRQ/s Fork Exec Exit Watts pkg-0 dram pkg-1 09:45:03 0.6 0.0 0.2 99.2 0.01 2861 2823000 46.84 24.82 3.68 18.33 09:45:08 1.0 0.0 0.1 99.0
Re: pg_upgrade and logical replication
On Mon, 13 Nov 2023 at 13:52, Peter Smith wrote: > > Here are some review comments for patch v13-0001 > > == > src/bin/pg_dump/pg_dump.c > > 1. getSubscriptionTables > > + int i_srsublsn; > + int i; > + int cur_rel = 0; > + int ntups; > > What is the difference between 'i' and 'cur_rel'? > > AFAICT these represent the same tuple index, in which case you might > as well throw away 'cur_rel' and only keep 'i'. Modified > ~~~ > > 2. getSubscriptionTables > > + for (i = 0; i < ntups; i++) > + { > + Oid cur_srsubid = atooid(PQgetvalue(res, i, i_srsubid)); > + Oid relid = atooid(PQgetvalue(res, i, i_srrelid)); > + TableInfo *tblinfo; > > Since this is all new code, using C99 style for loop variable > declaration of 'i' will be better. Modified > == > src/bin/pg_upgrade/check.c > > 3. check_for_subscription_state > > +check_for_subscription_state(ClusterInfo *cluster) > +{ > + int dbnum; > + FILE*script = NULL; > + char output_path[MAXPGPATH]; > + int ntup; > + > + /* Subscription relations state can be migrated since PG17. */ > + if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700) > + return; > + > + prep_status("Checking for subscription state"); > + > + snprintf(output_path, sizeof(output_path), "%s/%s", > + log_opts.basedir, > + "subscription_state.txt"); > > I felt this filename ought to be more like > 'subscriptions_with_bad_state.txt' because the current name looks like > a normal logfile with nothing to indicate that it is only for the > states of the "bad" subscriptions. I have kept the file name intentionally shorted as we noticed that when the upgrade of the publisher patch used a longer name there were some buildfarm failures because of longer names. > ~~~ > > 4. > + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) > + { > > Since this is all new code, using C99 style for loop variable > declaration of 'dbnum' will be better. Modified > ~~~ > > 5. > + * a) SUBREL_STATE_DATASYNC:A relation upgraded while in this state > + * would retain a replication slot, which could not be dropped by the > + * sync worker spawned after the upgrade because the subscription ID > + * tracked by the publisher does not match anymore. > > missing whitespace > > /SUBREL_STATE_DATASYNC:A relation/SUBREL_STATE_DATASYNC: A relation/ Modified Also added a couple of missing test cases. The attached v14 version patch has the changes for the same. Regards, Vignesh From 354137c80dfacc30bd0fa85c2f993f34ae5af4b9 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Mon, 30 Oct 2023 12:31:59 +0530 Subject: [PATCH v14] Preserve the full subscription's state during pg_upgrade Previously, only the subscription metadata information was preserved. Without the list of relations and their state it's impossible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To fix this problem, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The new SQL binary_upgrade_add_sub_rel_state function has the following syntax: SELECT binary_upgrade_add_sub_rel_state(subname text, relid oid, state char [,sublsn pg_lsn]) In the above, subname is the subscription name, relid is the relation identifier, the state is the state of the relation, sublsn is subscription lsn which is optional, and defaults to NULL/InvalidXLogRecPtr if not provided. pg_dump will retrieve these values(subname, relid, state and sublsn) from the old cluster. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To fix this problem, this patch teaches pg_dump to update the replication origin along with create subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. The new SQL binary_upgrade_replorigin_advance function has the following syntax: SELECT binary_upgrade_replorigin_advance(subname text, sublsn pg_lsn) In the above, subname is the subscription name and sublsn is subscription lsn. pg_dump will retrieve these values(subname and sublsn) from the old cluster. pg_upgrade will check that all the subscription relations are in 'i' (init), 's' (data sync) or in 'r' (ready) state, and will error out if that's not the case, logging the reason for the failure. Author: Julien Rouhaud, Vignesh C Reviewed-by: FIXME Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud --- doc/src/sgml/ref/pgupgrade.sgml| 72 src/backend/utils/adt/pg_upgrade_support.c | 130 +++ src/bin/pg_dum
Re: Potential use-after-free in partion related code
On 2023-Nov-15, Andres Freund wrote: > partConstraint = list_concat(partBoundConstraint, > > RelationGetPartitionQual(rel)); > > At this point partBoundConstraint may not be used anymore, because > list_concat() might have reallocated. > > But then a few lines later: > > /* we already hold a lock on the default partition */ > defaultrel = table_open(defaultPartOid, NoLock); > defPartConstraint = > get_proposed_default_constraint(partBoundConstraint); > > We use partBoundConstraint again. Yeah, this is wrong if partBoundConstraint is reallocated by list_concat. One possible fix is to change list_concat to list_concat_copy(), which leaves the original list unmodified. AFAICT the bug came in with 6f6b99d1335b, which added default partitions. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."
Re: On non-Windows, hard depend on uselocale(3)
"Tristan Partin" writes: > I would like to propose removing HAVE_USELOCALE, and just have WIN32, > which means that Postgres would require uselocale(3) on anything that > isn't WIN32. You would need to do some research and try to prove that that won't be a problem on any modern platform. Presumably it once was a problem, or we'd not have bothered with a configure check. (Some git archaeology might yield useful info about when and why we added the check.) regards, tom lane
Re: pg_basebackup check vs Windows file path limits
On 2023-11-15 We 06:34, Alvaro Herrera wrote: On 2023-Nov-13, Andrew Dunstan wrote: size? https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol Hmm, here's what that page says - I can't see it saying what you're suggesting here - am I missing something?: I don't think so. I think I just confused myself. Reading the docs it appears that other Windows APIs work as I described, but not this one. Anyway, after looking at it a bit more, I realized that this code uses MAX_PATH as basis for its buffer's length limit -- and apparently on Windows that's only 260, much shorter than MAXPGPATH (1024) which our own code uses to limit the buffers given to readlink(). So maybe fixing this is just a matter of doing s/MAX_PATH/MAXPGPATH/ in dirmod.c. I'll test it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: remaining sql/json patches
On 2023-11-15 09:11:19 -0800, Andres Freund wrote: > On 2023-11-15 22:00:41 +0900, Amit Langote wrote: > > > This causes a nontrivial increase in the size of the parser (~5% in an > > > optimized build here), I wonder if we can do better. > > > > Hmm, sorry if I sound ignorant but what do you mean by the parser here? > > gram.o, in an optimized build. Or, hm, maybe I meant the size of the generated gram.c actually. Either is worth looking at.
Re: remaining sql/json patches
Hi, Thanks, this looks like a substantial improvement. I don't quite have time to look right now, but I thought I'd answer one question below. On 2023-11-15 22:00:41 +0900, Amit Langote wrote: > > This causes a nontrivial increase in the size of the parser (~5% in an > > optimized build here), I wonder if we can do better. > > Hmm, sorry if I sound ignorant but what do you mean by the parser here? gram.o, in an optimized build. > I can see that the byte-size of gram.o increases by 1.66% after the > above additions (1.72% with previous versions). I'm not sure anymore how I measured it, but if you just looked at the total file size, that might not show the full gain, because of debug symbols etc. You can use the size command to look at just the code and data size. > I've also checked > using log_parser_stats that there isn't much slowdown in the > raw-parsing speed. What does "isn't much slowdown" mean in numbers? Greetings, Andres Freund
Potential use-after-free in partion related code
Hi, A while back I had proposed annotations for palloc() et al that let the compiler know about which allocators pair with what freeing functions. One thing that allows the compiler to do is to detect use after free. One such complaint is: ../../../../../home/andres/src/postgresql/src/backend/commands/tablecmds.c: In function ‘ATExecAttachPartition’: ../../../../../home/andres/src/postgresql/src/backend/commands/tablecmds.c:18758:25: warning: pointer ‘partBoundConstraint’ may be used after ‘list_concat’ [-Wuse-after-free] 18758 | get_proposed_default_constraint(partBoundConstraint); | ^~~~ ../../../../../home/andres/src/postgresql/src/backend/commands/tablecmds.c:18711:26: note: call to ‘list_concat’ here 18711 | partConstraint = list_concat(partBoundConstraint, | ^~~~ 18712 | RelationGetPartitionQual(rel)); | ~~ And it seems quite right: partConstraint = list_concat(partBoundConstraint, RelationGetPartitionQual(rel)); At this point partBoundConstraint may not be used anymore, because list_concat() might have reallocated. But then a few lines later: /* we already hold a lock on the default partition */ defaultrel = table_open(defaultPartOid, NoLock); defPartConstraint = get_proposed_default_constraint(partBoundConstraint); We use partBoundConstraint again. I unfortunately can't quickly enough identify what partConstraint, defPartConstraint, partBoundConstraint are, so I don't don't really know what the fix here is. Greetings, Andres Freund
Re: Explicitly skip TAP tests under Meson if disabled
On 2023-11-15 11:02:19 +0100, Peter Eisentraut wrote: > On 04.11.23 01:51, Andres Freund wrote: > > I'd just use a single test() invocation here, and add an argument to > > testwrap > > indicating that it should print out the skipped message. That way we a) > > don't > > need two test() invocations, b) could still see the test name etc in the > > test > > invocation. > > Here is a patch that does it that way. WFM!
Re: BUG #18097: Immutable expression not allowed in generated at
Hi, > True, but from the perspective of the affected code, the question is > basically "did you call expression_planner() yet". So I like this > naming for that connection, whereas something based on "transformation" > doesn't really connect to anything in existing function names. Fair enough. -- Best regards, Aleksander Alekseev
Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro
On Wed, Nov 15, 2023 at 09:27:18AM +0530, Amul Sul wrote: > Nevermind, I usually use git apply or git am, here are those errors: > > PG/ - (master) $ git apply ~/Downloads/retire_compatibility_macro_v1.patch > error: patch failed: src/backend/access/brin/brin.c:297 > error: src/backend/access/brin/brin.c: patch does not apply I wonder if your mail client is modifying the patch. Do you have the same issue if you download it from the archives? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: typo in fallback implementation for pg_atomic_test_set_flag()
On Tue, Nov 14, 2023 at 07:17:32PM -0800, Andres Freund wrote: > Are you planning to apply the fix? Yes, I'll take care of it. >> I'd ordinarily suggest removing this section of code since it doesn't seem >> to have gotten much coverage > > Which section precisely? The lines below this: /* * provide fallback for test_and_set using atomic_exchange if available */ #if !defined(PG_HAVE_ATOMIC_TEST_SET_FLAG) && defined(PG_HAVE_ATOMIC_EXCHANGE_U32) but above this: /* * provide fallback for test_and_set using atomic_compare_exchange if * available. */ #elif !defined(PG_HAVE_ATOMIC_TEST_SET_FLAG) && defined(PG_HAVE_ATOMIC_COMPARE_EXCHANGE_U32) >> but I'm actually looking into adding some faster atomic-exchange >> implementations that may activate this code for certain >> compiler/architecture combinations. > > Hm. I don't really see how adding a faster atomic-exchange implementation > could trigger this implementation being used? That'd define PG_HAVE_ATOMIC_EXCHANGE_U32, so this fallback might be used if PG_HAVE_ATOMIC_TEST_SET_FLAG is not defined. I haven't traced through all the #ifdefs that lead to this point exhaustively, though, so perhaps this is still unlikely. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BUG #18097: Immutable expression not allowed in generated at
Aleksander Alekseev writes: >>> Oh no! We encountered one of the most difficult problems in computer >>> science [1]. >> Indeed :-(. Looking at it again this morning, I'm thinking of >> using "contain_mutable_functions_after_planning" --- what do you >> think of that? > It's better but creates an impression that the actual planning will be > involved. True, but from the perspective of the affected code, the question is basically "did you call expression_planner() yet". So I like this naming for that connection, whereas something based on "transformation" doesn't really connect to anything in existing function names. regards, tom lane
Re: Fix documentation for pg_stat_statements JIT deform_counter
On Wed, Nov 15, 2023 at 01:53:13PM +0100, Daniel Gustafsson wrote: > > On 11 Nov 2023, at 10:26, Julien Rouhaud wrote: > > > I was adding support for the new pg_stat_statements JIT deform_counter in > > PoWA > > when I realized that those were added after jit_generation_time in the > > documentation while they're actually at the end of the view. > > Nice catch, that was indeed an omission in the original commit. Thanks for > the > patch, I'll apply that shortly. Thanks!
Re: Some performance degradation in REL_16 vs REL_15
"Anton A. Melnikov" writes: > I can't understand why i get the opposite results on my pc and on the server. > It is clear that the absolute > TPS values will be different for various configurations. This is normal. But > differences? > Is it unlikely that some kind of reference configuration is needed to > accurately > measure the difference in performance. Probably something wrong with my pc, > but now > i can not figure out what's wrong. > Would be very grateful for any advice or comments to clarify this problem. Benchmarking is hard :-(. IME it's absolutely typical to see variations of a couple of percent even when "nothing has changed", for example after modifying some code that's nowhere near any hot code path for the test case. I usually attribute this to cache effects, such as a couple of bits of hot code now sharing or not sharing a cache line. If you use two different compiler versions then that situation is likely to occur all over the place even with exactly the same source code. NUMA creates huge reproducibility problems too on multisocket machines (which your server is IIUC). When I had a multisocket workstation I'd usually bind all the server processes to one socket if I wanted more-or-less-repeatable numbers. I wouldn't put a lot of faith in the idea that measured pgbench differences of up to several percent are meaningful at all, especially when comparing across different hardware and different OS+compiler versions. There are too many variables that have little to do with the theoretical performance of the source code. regards, tom lane
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund wrote: > On 2023-11-14 19:14:57 +0800, Richard Guo wrote: > > While working on BUG #18187 [1], I noticed that we also have issues with > > how SJE replaces join clauses involving the removed rel. As an example, > > consider the query below, which would trigger an Assert. > > > > create table t (a int primary key, b int); > > > > explain (costs off) > > select * from t t1 > >inner join t t2 on t1.a = t2.a > > left join t t3 on t1.b > 1 and t1.b < 2; > > server closed the connection unexpectedly > > > > The Assert failure happens in remove_self_join_rel() when we're trying > > to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2', > > share the same pointer of 'required_relids', which is {t1, t3} at first. > > After we've performed replace_varno for the first clause, the > > required_relids becomes {t2, t3}, which is no problem. However, the > > second clause's required_relids also becomes {t2, t3}, because they are > > actually the same pointer. So when we proceed with replace_varno on the > > second clause, we'd trigger the Assert. > > Good catch. > > > > Off the top of my head I'm thinking that we can fix this kind of issue > > by bms_copying the bitmapset first before we make a substitution in > > replace_relid(), like attached. > > > > Alternatively, we can revise distribute_qual_to_rels() as below so that > > different RestrictInfos don't share the same pointer of required_relids. > > > --- a/src/backend/optimizer/plan/initsplan.c > > +++ b/src/backend/optimizer/plan/initsplan.c > > @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node > > *clause, > > * nonnullable-side rows failing the qual. > > */ > > Assert(ojscope); > > - relids = ojscope; > > + relids = bms_copy(ojscope); > > Assert(!pseudoconstant); > > } > > else > > > > With this way, I'm worrying that there are other places where we should > > avoid sharing the same pointer to Bitmapset structure. > > Indeed. > > > > I'm not sure how to discover all these places. Any thoughts? > > At the very least I think we should add a mode to bitmapset.c mode where > every modification of a bitmapset reallocates, rather than just when the size > actually changes. Because we only reallocte and free in relatively uncommon > cases, particularly on 64bit systems, it's very easy to not find spots that > continue to use the input pointer to one of the modifying bms functions. > > A very hacky implementation of that indeed catches this bug with the existing > regression tests. > > The tests do *not* pass with just the attached applied, as the "Delete relid > without substitution" path has the same issue. With that also copying and all > the "reusing" bms* functions always reallocating, the tests pass - kinda. > > > The kinda because there are callers to bms_(add|del)_members() that pass the > same bms as a and b, which only works if the reallocation happens "late". +1, Neat idea. I'm willing to work on this. Will propose the patch soon. -- Regards, Alexander Korotkov
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
On Wed, Nov 15, 2023 at 8:04 AM Andres Freund wrote: > > On 2023-11-14 14:42:13 +0200, Alexander Korotkov wrote: > > It's possibly dumb option, but what about just removing the assert? > > That's not at all an option - the in-place bms_* functions can free their > input. So a dangling pointer to the "old" version is a use-after-free waiting > to happen - you just need a query that actually gets to bitmapsets that are a > bit larger. Yeah, now I got it, thank you. I was under the wrong impression that bitmapset has the level of indirection, so the pointer remains valid. Now, I see that bitmapset manipulation functions can do free/repalloc making the previous bitmapset pointer invalid. -- Regards, Alexander Korotkov
Re: Tab completion for CREATE TABLE ... AS
Le 15/11/2023 à 03:58, Michael Paquier a écrit : On Thu, Nov 02, 2023 at 07:27:02PM +0300, Gilles Darold wrote: Look like the tab completion for CREATE TABLE ... AS is not proposed. + /* Complete CREATE TABLE AS with list of keywords */ + else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") || +TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "AS")) + COMPLETE_WITH("SELECT", "WITH"); There is a bit more than SELECT and WITH as possible query for a CTAS. How about VALUES, TABLE or even EXECUTE (itself able to handle a SELECT, TABLE or VALUES)? -- Michael Right, I don't know how I have missed the sql-createtableas page in the documentation. Patched v2 fixes the keyword list, I have also sorted by alphabetical order the CREATE TABLE completion (AS was at the end of the list). It has also been re-based on current master. -- Gilles Darold http://www.darold.net/ diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index a1dfc11f6b..e6c88cf1a1 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2501,11 +2501,15 @@ psql_completion(const char *text, int start, int end) /* Complete CREATE TABLE with '(', OF or PARTITION OF */ else if (TailMatches("CREATE", "TABLE", MatchAny) || TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny)) - COMPLETE_WITH("(", "OF", "PARTITION OF"); + COMPLETE_WITH("(", "AS", "OF", "PARTITION OF"); /* Complete CREATE TABLE OF with list of composite types */ else if (TailMatches("CREATE", "TABLE", MatchAny, "OF") || TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "OF")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes, NULL); + /* Complete CREATE TABLE AS with list of keywords */ + else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") || + TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "AS")) + COMPLETE_WITH("EXECUTE", "SELECT", "TABLE", "VALUES", "WITH"); /* Complete CREATE TABLE name (...) with supported options */ else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)") || TailMatches("CREATE", "UNLOGGED", "TABLE", MatchAny, "(*)"))
Re: Allow tests to pass in OpenSSL FIPS mode
> On 15 Nov 2023, at 12:44, Peter Eisentraut wrote: > > On 15.11.23 00:07, Tom Lane wrote: >> I'm more concerned about the 3DES situation. Fedora might be a bit >> ahead of the curve here, but according to the link above, everybody is >> supposed to be in compliance by the end of 2023. So I'd be inclined >> to guess that the 3DES-is-rejected case is going to be mainstream >> before v17 ships. > > Right. It is curious that I have not found any activity in the OpenSSL issue > trackers about this. But if you send me your results file, then I can > include it in the patch as an alternative expected. As NIST SP800-131A allows decryption with 3DES and DES I dont think OpenSSL will do much other than move it to the legacy module where it can be used opt-in like DES. SKIPJACK is already disallowed since before but is still tested with decryption during FIPS validation. Using an alternative resultsfile to handle platforms which explicitly removes disallowed ciphers seem like the right choice. Since the 3DES/DES deprecations aren't limited to FIPS, do we want to do anything for pgcrypto where we have DES/3DES encryption? Maybe a doc patch which mentions the deprecation with a link to the SP could be in order? -- Daniel Gustafsson
Re: [PATCH] pgbench log file headers
Hello On Mon, Nov 13, 2023 at 6:01 PM Andres Freund wrote: > Hi, > > On 2023-11-13 11:55:07 -0600, Adam Hendel wrote: > > Currently, pgbench will log individual transactions to a logfile when the > > `--log` parameter flag is provided. The logfile, however, does not > include > > column header. It has become a fairly standard expectation of users to > have > > column headers present in flat files. Without the header in the pgbench > log > > files, new users must navigate to the docs and piece together the column > > headers themselves. Most industry leading frameworks have tooling built > in > > to read column headers though, for example python/pandas read_csv(). > > The disadvantage of doing that is that a single pgbench run with --log will > generate many files when using -j, to avoid contention. The easiest way to > deal with that after the run is to cat all the log files to a larger one. > If > you do that with headers present, you obviously have a few bogus rows (the > heades from the various files). > > I guess you could avoid the "worst" of that by documenting that the > combined > log file should be built by > cat {$log_prefix}.${pid} {$log_prefix}.${pid}.* > and only outputting the header in the file generated by the main thread. > > > We can improve the experience for users by adding column headers to > pgbench > > logfiles with an optional command line flag, `--log-header`. This will > keep > > the API backwards compatible by making users opt-in to the column > headers. > > It follows the existing pattern of having conditional flags in pgbench’s > > API; the `--log` option would have both –log-prefix and –log-header if > this > > work is accepted. > > > The implementation considers the column headers only when the > > `--log-header` flag is present. The values for the columns are taken > > directly from the “Per-Transaction Logging” section in > > https://www.postgresql.org/docs/current/pgbench.html and takes into > account > > the conditional columns `schedule_lag` and `retries`. > > > > > > Below is an example of what that logfile will look like: > > > > > > pgbench postgres://postgres:postgres@localhost:5432/postgres --log > > --log-header > > > > client_id transaction_no time script_no time_epoch time_us > > Independent of your patch, but we imo ought to combine time_epoch time_us > in > the log output. There's no point in forcing consumers to combine those > fields, > and it makes logging more expensive... And if we touch this, we should > just > switch to outputting nanoseconds instead of microseconds. > > It also is quite odd that we have "time" and "time_epoch", "time_us", where > time is the elapsed time of an individual "transaction" and time_epoch + > time_us together are a wall-clock timestamp. Without differentiating > between > those, the column headers seem not very useful, because one needs to look > in > the documentation to understand the fields. > I don't think there's all that strong a need for backward compatibility in > pgbench. We could just change the columns as I suggest above and then > always > emit the header in the "main" log file. > > Do you think this should be done in separate patches? First patch: log the column headers in the "main" log file. No --log-header flag, make it the default behavior of --log. Next patch: collapse "time_epoch" and "time_us" in the log output and give the "time" column a name that is more clear. Adam > Greetings, > > Andres Freund >
Re: trying again to get incremental backup
Hi Robert, [..spotted the v9 patchset..] so I've spent some time playing still with patchset v8 (without the 6/6 testing patch related to wal_level=minimal), with the exception of - patchset v9 - marked otherwise. 1. On compile time there were 2 warnings to shadowing variable (at least with gcc version 10.2.1), but on v9 that is fixed: blkreftable.c: In function ‘WriteBlockRefTable’: blkreftable.c:520:24: warning: declaration of ‘brtentry’ shadows a previous local [-Wshadow=compatible-local] walsummarizer.c: In function ‘SummarizeWAL’: walsummarizer.c:833:36: warning: declaration of ‘private_data’ shadows a previous local [-Wshadow=compatible-local] 2. Usability thing: I hit the timeout hard: "This backup requires WAL to be summarized up to 0/9D8, but summarizer has only reached 0/0." with summarize_wal=off (default) but apparently this in TODO. Looks like an important usability thing. 3. I've verified that if the DB was in wal_level=minimal even temporarily (and thus summarization was disabled) it is impossible to take incremental backup: pg_basebackup: error: could not initiate base backup: ERROR: WAL summaries are required on timeline 1 from 0/7D8 to 0/1028, but the summaries for that timeline and LSN range are incomplete DETAIL: The first unsummarized LSN is this range is 0/D04AE88. 4. As we have discussed off list, there's is (was) this pg_combinebackup bug in v8's reconstruct_from_incremental_file() where it was unable to realize that - in case of combining multiple incremental backups - it should stop looking for the previous instance of the full file (while it was fine with v6 of the patchset). I've checked it on v9 - it is good now. 5. On v8 i've finally played a little bit with standby(s) and this patchset with couple of basic scenarios while mixing source of the backups: a. full on standby, incr1 on standby, full db restore (incl. incr1) on standby # sometimes i'm getting spurious error like those when doing incrementals on standby with -c fast : 2023-11-15 13:49:05.721 CET [10573] LOG: recovery restart point at 0/A28 2023-11-15 13:49:07.591 CET [10597] WARNING: aborting backup due to backend exiting before pg_backup_stop was called 2023-11-15 13:49:07.591 CET [10597] ERROR: manifest requires WAL from final timeline 1 ending at 0/AF8, but this backup starts at 0/A28 2023-11-15 13:49:07.591 CET [10597] STATEMENT: BASE_BACKUP ( INCREMENTAL, LABEL 'pg_basebackup base backup', PROGRESS, CHECKPOINT 'fast', WAIT 0, MANIFEST 'yes', TARGET 'client') # when you retry the same pg_basebackup it goes fine (looks like CHECKPOINT on standby/restartpoint <-> summarizer disconnect, I'll dig deeper tomorrow. It seems that issuing "CHECKPOINT; pg_sleep(1);" against primary just before pg_basebackup --incr on standby workarounds it) b. full on primary, incr1 on standby, full db restore (incl. incr1) on standby # WORKS c. full on standby, incr1 on standby, full db restore (incl. incr1) on primary # WORKS* d. full on primary, incr1 on standby, full db restore (incl. incr1) on primary # WORKS* * - needs pg_promote() due to the controlfile having standby bit + potential fiddling with postgresql.auto.conf as it is having primary_connstring GUC. 6. Sci-fi-mode-on: I was wondering about the dangers of e.g. having more recent pg_basebackup (e.g. from pg18 one day) running against pg17 in the scope of having this incremental backups possibility. Is it going to be safe? (currently there seems to be no safeguards against such use) or should those things (core, pg_basebackup) should be running in version lock step? Regards, -J.
Re: Fix documentation for pg_stat_statements JIT deform_counter
> On 11 Nov 2023, at 10:26, Julien Rouhaud wrote: > I was adding support for the new pg_stat_statements JIT deform_counter in PoWA > when I realized that those were added after jit_generation_time in the > documentation while they're actually at the end of the view. Nice catch, that was indeed an omission in the original commit. Thanks for the patch, I'll apply that shortly. -- Daniel Gustafsson
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
On Wed, Nov 15, 2023 at 5:09 PM Peter Eisentraut wrote: > On 14.11.23 11:40, Amul Sul wrote: > > Please have a look at the attached version, updating the syntax to have > "AS" > > after EXPRESSION and other changes suggested previously. > > The code structure looks good to me now. > Thank you for your review. > > Question: Why are you using AT_PASS_ADD_OTHERCONSTR? I don't know if > it's right or wrong, but if you have a specific reason, it would be good > to know. > I referred to ALTER COLUMN DEFAULT and used that. > I think ATExecSetExpression() needs to lock pg_attribute? Did you lose > that during the refactoring? > I have removed that intentionally since we were not updating anything in pg_attribute like ALTER DROP EXPRESSION. > > Tiny comment: The error message in ATExecSetExpression() does not need > to mention "stored", since it would be also applicable to virtual > generated columns in the future. > I had to have the same thought, but later decided when we do that virtual column thing, we could simply change that. I am fine to do that change now as well, let me know your thought. > Documentation additions in alter_table.sgml should use one-space indent > consistently. Also, "This form replaces expression" is missing a "the"? > Ok, will fix that. Regards, Amul
Re: RFC: Pluggable TOAST
On Tue, 14 Nov 2023, 14:12 Nikita Malakhov, wrote: > > Hi! > > Matthias, regarding your message above, I have a question to ask. > On typed TOAST implementations - we thought that TOAST method used > for storing data could depend not only on data type, but on the flow or > workload, > like out bytea appendable toaster which is much (hundreds of times) faster on > update compared to regular procedure. That was one of ideas behind the > Pluggable TOAST - we can choose the most suitable TOAST implementation > available. > > If we have a single TOAST entry point for data type - then we should have > some means to control it or choose a TOAST method suitable to our needs. > Or should not? I'm not sure my interpretation of the question is correct, but I'll assume it's "would you want something like STORAGE [plain/external/...] for controlling type-specific toast operations?". I don't see many reasons why we'd need a system to disable (some of) those features, with the only one being "the workload is mostly read-only of the full attributes, so any performance overhead of type-aware detoasting is not worth the temporary space savings during updates". So, while I do think there would be good reasons for typed toasting to be disabled, I don't see a good reason for only specific parts of type-specific toasting to be disabled (no reason for 'disable the append optimization for bytea, but not the splice optimization'). Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Allow tests to pass in OpenSSL FIPS mode
On 15.11.23 00:07, Tom Lane wrote: I'm more concerned about the 3DES situation. Fedora might be a bit ahead of the curve here, but according to the link above, everybody is supposed to be in compliance by the end of 2023. So I'd be inclined to guess that the 3DES-is-rejected case is going to be mainstream before v17 ships. Right. It is curious that I have not found any activity in the OpenSSL issue trackers about this. But if you send me your results file, then I can include it in the patch as an alternative expected.
Re: BUG #18097: Immutable expression not allowed in generated at
Hi, > > Oh no! We encountered one of the most difficult problems in computer > > science [1]. > > Indeed :-(. Looking at it again this morning, I'm thinking of > using "contain_mutable_functions_after_planning" --- what do you > think of that? It's better but creates an impression that the actual planning will be involved. According to the comments for expression_planner(): ``` * Currently, we disallow sublinks in standalone expressions, so there's no * real "planning" involved here. (That might not always be true though.) ``` I'm not very well familiar with the part of code responsible for planning, but I find this inconsistency confusing. Since the code is written for people to be read and is read more often than written personally I believe that longer and more descriptive names are better. Something like contain_mutable_functions_after_planner_transformations(). This being said, in practice one should read the comments to learn about corner cases, pre- and postconditions anyway, so maybe it's not that a big deal. I think of contain_mutable_functions_after_transformations() as a good compromise between the length and descriptiveness. -- Best regards, Aleksander Alekseev
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
On 14.11.23 11:40, Amul Sul wrote: Please have a look at the attached version, updating the syntax to have "AS" after EXPRESSION and other changes suggested previously. The code structure looks good to me now. Question: Why are you using AT_PASS_ADD_OTHERCONSTR? I don't know if it's right or wrong, but if you have a specific reason, it would be good to know. I think ATExecSetExpression() needs to lock pg_attribute? Did you lose that during the refactoring? Tiny comment: The error message in ATExecSetExpression() does not need to mention "stored", since it would be also applicable to virtual generated columns in the future. Documentation additions in alter_table.sgml should use one-space indent consistently. Also, "This form replaces expression" is missing a "the"?
Re: MERGE ... RETURNING
On Mon, 13 Nov 2023 at 05:29, jian he wrote: > > v13 works fine. all tests passed. The code is very intuitive. played > with multi WHEN clauses, even with before/after row triggers, work as > expected. > Thanks for the review and testing! > I don't know when replace_outer_merging will be invoked. even set a > breakpoint on it. coverage shows replace_outer_merging only called > once. > It's used when MERGING() is used in a subquery in the RETURNING list. The MergingFunc node in the subquery is replaced by a Param node, referring to the outer MERGE query, so that the result from MERGING() is available in the SELECT subquery (under any other circumstances, you're not allowed to use MERGING() in a SELECT). This is similar to what happens when a subquery contains an aggregate over columns from an outer query only -- for example, see: https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES:~:text=When%20an%20aggregate,aggregate%20belongs%20to. https://github.com/postgres/postgres/commit/e649796f128bd8702ba5744d36f4e8cb81f0b754 A MERGING() expression in a subquery in the RETURNING list is analogous, in that it belongs to the outer MERGE query, not the SELECT subquery. > sql-merge.html miss mentioned RETURNING need select columns privilege? > in sql-insert.html, we have: > "Use of the RETURNING clause requires SELECT privilege on all columns > mentioned in RETURNING. If you use the query clause to insert rows > from a query, you of course need to have SELECT privilege on any table > or column used in the query." > Ah, good point. I don't think I looked at the privileges paragraph on the MERGE page. Currently it says: You will require the SELECT privilege on the data_source and any column(s) of the target_table_name referred to in a condition. Being pedantic, there are 2 problems with that: 1. It might be taken to imply that you need the SELECT privilege on every column of the data_source, which isn't the case. 2. It mentions conditions, but not expressions (such as those that can appear in INSERT and UPDATE actions). A more accurate statement would be: You will require the SELECT privilege and any column(s) of the data_source and target_table_name referred to in any condition or expression. which is also consistent with the wording used on the UPDATE manual page. Done that way, I don't think it would need to be updated to mention RETURNING, because RETURNING just returns a list of expressions. Again, that would be consistent with the UPDATE page, which doesn't mention RETURNING in its discussion of privileges. > I saw the change in src/sgml/glossary.sgml, So i looked around. in the > "Materialized view (relation)" part. "It cannot be modified via > INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into > that sentence? > also there is SELECT, INSERT, UPDATE, DELETE, do we need to add a > MERGE entry in glossary.sgml? Yes, that makes sense. Attached is a separate patch with those doc updates, intended to be applied and back-patched independently of the main RETURNING patch. Regards, Dean merge-docs.patch.no-cfbot Description: Binary data
Re: pg_basebackup check vs Windows file path limits
On 2023-Nov-13, Andrew Dunstan wrote: > > size? > > > > https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol > > Hmm, here's what that page says - I can't see it saying what you're > suggesting here - am I missing something?: I don't think so. I think I just confused myself. Reading the docs it appears that other Windows APIs work as I described, but not this one. Anyway, after looking at it a bit more, I realized that this code uses MAX_PATH as basis for its buffer's length limit -- and apparently on Windows that's only 260, much shorter than MAXPGPATH (1024) which our own code uses to limit the buffers given to readlink(). So maybe fixing this is just a matter of doing s/MAX_PATH/MAXPGPATH/ in dirmod.c. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Adding facility for injection points (or probe points?) for more advanced tests
On 2023-Nov-15, Michael Paquier wrote: > On Tue, Nov 14, 2023 at 02:11:50PM +0100, Alvaro Herrera wrote: > > You named the hash table InjectionPointHashByName, which seems weird. > > Is there any *other* way to locate an injection point that is not by > > name? > > I am not sure what you mean here. Names are kind of the most > portable and simplest thing I could think of. Oh, I think you're overthinking what my comment was. I was saying, just name it "InjectionPointsHash". Since there appears to be no room for another hash table for injection points, then there's no need to specify that this one is the ByName hash. I couldn't think of any other way to organize the injection points either. > > In this patch, injection points are instance-wide (because the hash > > table is in shmem). > > Yes, still not something that's required in the core APIs or an > initial batch. I agree that we can do the easy thing first and build it up later. I just hope we don't get too wedded on the current interface because of lack of time in the current release that we get stuck with it. > I am not sure that it is a good idea to enforce a specific conditional > logic in the backend core code. Agreed, let's get more experience on what other types of tests people want to build, and how are things going to interact with each other. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No necesitamos banderas No reconocemos fronteras" (Jorge González)
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Translation-wise, this doesn't work, because you're building a string. There's no reason to think that the words "logical" and "physical" should stay untranslated; the message would make no sense, or at least would be very ugly. You should do something like if (am_walsender) { ereport(log_replication_commands ? LOG : DEBUG1, SlotIsLogical(s) ? errmsg("acquired logical replication slot \"%s\"", NameStr(s->data.name)) : errmsg("acquired physical replication slot \"%s\"", NameStr(s->data.name))); } (Obviously, lose the "translator:" comments since they are unnecessary) If you really want to keep the "logical"/"physical" word untranslated, you need to split it out of the sentence somehow. But it would be really horrible IMO. Like errmsg("acquired replication slot \"%s\" of type \"%s\"", NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical") -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
On non-Windows, hard depend on uselocale(3)
I have been working on adding using thread-safe locale APIs within Postgres where appropriate[0]. The patch that I originally submitted crashed during initdb (whoops!), so I worked on fixing the crash, which led me to having to touch some code in chklocale.c, which became a frustrating experience because chklocale.c is compiled in 3 different configurations. pgport_variants = { '_srv': internal_lib_args + { 'dependencies': [backend_port_code], }, '': default_lib_args + { 'dependencies': [frontend_port_code], }, '_shlib': default_lib_args + { 'pic': true, 'dependencies': [frontend_port_code], }, } This means that some APIs I added or changed in pg_locale.c, can't be used without conditional compilation depending on what variant is being compiled. Additionally, I also have conditional compilation based on HAVE_USELOCALE and WIN32. I would like to propose removing HAVE_USELOCALE, and just have WIN32, which means that Postgres would require uselocale(3) on anything that isn't WIN32. [0]: https://www.postgresql.org/message-id/cwmw5ozbwj10.1yflqwsue5...@neon.tech -- Tristan Partin Neon (https://neon.tech)
Re: Remove MSVC scripts from the tree
On 15.11.23 05:49, Michael Paquier wrote: Attached is a v4. I'm happy with that. (Note, however, that your rebase didn't pick up commits e7814b40d0 and b41b1a7f49 that I did yesterday. Please check that again.)
Re: Fix some memory leaks in ecpg.addons
On Wed Nov 8, 2023 at 11:52 AM CST, Tom Lane wrote: "Tristan Partin" writes: > On Wed Nov 8, 2023 at 11:18 AM CST, Michael Meskes wrote: >> Agreed, it's not exactly uncommon for tools like ecpg to not worry >> about memory. After all it gets freed when the program ends. > In the default configuration of AddressSanitizer, I can't even complete > a full build of Postgres. Why is the meson stuff building ecpg test cases as part of the core build? That seems wrong for a number of reasons, not only that we don't hold that code to the same standards as the core server. After looking into this a tiny bit more, we are building the dependencies of the ecpg tests. foreach pgc_file : pgc_files exe_input = custom_target('@0@.c'.format(pgc_file), input: '@0@.pgc'.format(pgc_file), output: '@BASENAME@.c', command: ecpg_preproc_test_command_start + ['-C', 'ORACLE',] + ecpg_preproc_test_command_end, install: false, build_by_default: false, kwargs: exe_preproc_kw, ) ecpg_test_dependencies += executable(pgc_file, exe_input, kwargs: ecpg_test_exec_kw, ) endforeach This is the pattern that we have in all the ecpg/test/*/meson.build files. That ecpg_test_dependencies variable is then used in the actual ecpg tests: tests += { 'name': 'ecpg', 'sd': meson.current_source_dir(), 'bd': meson.current_build_dir(), 'ecpg': { 'expecteddir': meson.current_source_dir(), 'inputdir': meson.current_build_dir(), 'schedule': ecpg_test_files, 'sql': [ 'sql/twophase', ], 'test_kwargs': { 'depends': ecpg_test_dependencies, }, 'dbname': 'ecpg1_regression,ecpg2_regression', 'regress_args': ecpg_regress_args, }, } So in my opinion there is nothing wrong here. The build is working as intended. Does this make sense to you, Tom? -- Tristan Partin Neon (https://neon.tech)
Re: ResourceOwner refactoring
On 13/11/2023 01:08, Thomas Munro wrote: On Mon, Nov 13, 2023 at 11:16 AM Heikki Linnakangas wrote: On 11/11/2023 14:00, Alexander Lakhin wrote: 10.11.2023 17:26, Heikki Linnakangas wrote: I think that is surprising behavior from the DSA facility. When you make allocations with dsa_allocate() or just call dsa_get_address() on an existing dsa_pointer, you wouldn't expect the current resource owner to matter. I think dsa_create/attach() should store the current resource owner in the dsa_area, for use in subsequent operations on the DSA, per attached patch (0002-Fix-dsa.c-with-different-resource-owners.patch). Yeah, I agree that it is surprising and dangerous that the DSM segments can have different owners if you're not careful, and it's not clear that you have to be. Interesting that no one ever reported breakage in parallel query due to this thinko, presumably because the current resource owner always turns out to be either the same one or something with longer life. Yep. I ran check-world with an extra assertion that "CurrentResourceOwner == area->resowner || area->resowner == NULL" to verify that. No failures. With the patch 0002 applied, I'm observing another anomaly: Ok, so the ownership of a dsa was still muddled :-(. Attached is a new version that goes a little further. It replaces the whole 'mapping_pinned' flag in dsa_area with the 'resowner'. When a mapping is pinned, it means that 'resowner == NULL'. This is now similar to how the resowner field and pinning works in dsm.c. This patch makes sense to me. It might be tempting to delete the dsa_pin_mapping() interface completely and say that if CurrentResourceOwner == NULL, then it's effectively (what we used to call) pinned, but I think it's useful for exception-safe construction of multiple objects to be able to start out with a resource owner and then later 'commit' by clearing it. As seen in GetSessionDsmHandle(). Yeah that's useful. I don't like the "pinned mapping" term here. It's confusing because we also have dsa_pin(), which means something different: the area will continue to exist even after all backends have detached from it. I think we should rename 'dsa_pin_mapping()' to 'dsa_forget_resowner()' or something like that. I pushed these fixes, without that renaming. Let's do that as a separate patch if we like that. I didn't backpatch this for now, because we don't seem to have a live bug in back branches. You could argue for backpatching because this could bite us in the future if we backpatch something else that uses dsa's, or if there are extensions using it. We can do that later after we've had this in 'master' for a while. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Allow tests to pass in OpenSSL FIPS mode
> On 15 Nov 2023, at 00:07, Tom Lane wrote: > (In reality, people running FIPS mode are probably pretty > accustomed to seeing this error, so maybe it's not worth the > trouble to improve it.) In my experience this holds a lot of truth, this is a common error pattern and while all improvements to error messages are good, it's not a reason to hold off this patch. -- Daniel Gustafsson
Re: Explicitly skip TAP tests under Meson if disabled
On 04.11.23 01:51, Andres Freund wrote: I'd just use a single test() invocation here, and add an argument to testwrap indicating that it should print out the skipped message. That way we a) don't need two test() invocations, b) could still see the test name etc in the test invocation. Here is a patch that does it that way. From d58e65a71d2fab40ab22f047999efe06d96d8688 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 15 Nov 2023 11:00:49 +0100 Subject: [PATCH v2] Explicitly skip TAP tests under Meson if disabled If the tap_tests option is disabled under Meson, the TAP tests are currently not registered at all. But this makes it harder to see what is going one, why suddently there are fewer tests than before. Instead, run testwrap with an option that marks the test as skipped. That way, the total list and count of tests is constant whether the option is enabled or not. --- meson.build| 5 +++-- src/tools/testwrap | 5 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/meson.build b/meson.build index 47c8fcdc53..286d7e4269 100644 --- a/meson.build +++ b/meson.build @@ -3252,8 +3252,9 @@ foreach test_dir : tests testport += 1 elif kind == 'tap' + testwrap_tap = testwrap_base if not tap_tests_enabled -continue +testwrap_tap += ['--skip', 'TAP tests not enabled'] endif test_command = [ @@ -3293,7 +3294,7 @@ foreach test_dir : tests test(test_dir['name'] / onetap_p, python, kwargs: test_kwargs, - args: testwrap_base + [ + args: testwrap_tap + [ '--testgroup', test_dir['name'], '--testname', onetap_p, '--', test_command, diff --git a/src/tools/testwrap b/src/tools/testwrap index 7a64fe76a2..d01e61051c 100755 --- a/src/tools/testwrap +++ b/src/tools/testwrap @@ -12,6 +12,7 @@ parser.add_argument('--srcdir', help='source directory of test', type=str) parser.add_argument('--basedir', help='base directory of test', type=str) parser.add_argument('--testgroup', help='test group', type=str) parser.add_argument('--testname', help='test name', type=str) +parser.add_argument('--skip', help='skip test (with reason)', type=str) parser.add_argument('test_command', nargs='*') args = parser.parse_args() @@ -23,6 +24,10 @@ print('# executing test in {} group {} test {}'.format( testdir, args.testgroup, args.testname)) sys.stdout.flush() +if args.skip is not None: +print('1..0 # Skipped: ' + args.skip) +sys.exit(0) + if os.path.exists(testdir) and os.path.isdir(testdir): shutil.rmtree(testdir) os.makedirs(testdir) -- 2.42.1
Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
On Wed, Nov 15, 2023 at 11:00 AM Bharath Rupireddy wrote: > > PSA v15 patch. > The patch looks good to me. I have slightly modified the translator message and commit message in the attached. I'll push this tomorrow unless there are any comments. -- With Regards, Amit Kapila. v16-0001-Log-messages-for-replication-slot-acquisition-an.patch Description: Binary data
Re: Some performance degradation in REL_16 vs REL_15
On 30.10.2023 22:51, Andres Freund wrote: There's really no point in comparing peformance with assertions enabled (leaving aside assertions that cause extreme performance difference, making development harder). We very well might have added assertions making things more expensive, without affecting performance in optimized/non-assert builds. Thanks for advice! I repeated measurements on my pc without asserts and CFLAGS="-O2". Also i reduced the number of clients to -c6 to leave a reserve of two cores from my 8-core cpu and used -j6 accordingly. The results were similar: on my pc REL_10_STABLE(c18c12c9) was faster than REL_16_STABLE(07494a0d) but the effect became weaker: REL_10_STABLE gives ~965+-15 TPS(+-2%) while REL_16_STABLE gives ~920+-30 TPS(+-3%) in the test: pgbench -s8 -c6 -T20 -j6 So 10 is faster than 16 by ~5%. (see raw-my-pc.txt attached for the raw data) Then, thanks to my colleagues, i carried out similar measurements on the more powerful 24-core standalone server. The REL_10_STABLE gives 8260+-100 TPS(+-1%) while REL_16_STABLE gives 8580+-90 TPS(+-1%) in the same test: pgbench -s8 -c6 -T20 -j6 The test gave an opposite result! On that server the 16 is faster than 10 by ~4%. When i scaled the test on server to get the same reserve of two cores, the results became like this: REL_10_STABLE gives ~16000+-300 TPS(+-2%) while REL_16_STABLE gives ~18500+-200 TPS(+-1%) in the scaled test: pgbench -s24 -c22 -T20 -j22 Here the difference is more noticeable: 16 is faster than 10 by ~15%. (raw-server.txt) The configure options and test scripts on my pc and server were the same: export CFLAGS="-O2" ./configure --enable-debug --with-perl --with-icu --enable-depend --enable-tap-tests #reinstall #reinitdb #create database bench for ((i=0; i<100; i++)); do pgbench -U postgres -i -s8 bench> /dev/null 2>&1; psql -U postgres -d bench -c "checkpoint"; RES=$(pgbench -U postgres -c6 -T20 -j6 bench; Configurations: my pc: 8-core AMD Ryzen 7 4700U @ 1.4GHz, 64GB RAM, NVMe M.2 SSD drive. Linux 5.15.0-88-generic #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux server: 2x 12-hyperthreaded cores Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 24GB RAM, RAID from SSD drives. Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux I can't understand why i get the opposite results on my pc and on the server. It is clear that the absolute TPS values will be different for various configurations. This is normal. But differences? Is it unlikely that some kind of reference configuration is needed to accurately measure the difference in performance. Probably something wrong with my pc, but now i can not figure out what's wrong. Would be very grateful for any advice or comments to clarify this problem. With the best wishes! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company REL_10_STABLE c18c12c983a84d55e58b176969782c7ffac3272b pgbench -s8 -c6 -T20 -j6 940.47198 954.621585 902.319686 965.387517 959.44536 970.882218 922.012141 969.642272 964.549628 935.639076 958.835093 976.912892 975.618375 960.599515 981.900039 973.34447 964.563699 960.321335 962.643262 975.631214 971.78315 965.226256 961.106572 968.520002 973.825485 978.49579 963.863368 973.906058 966.676175 965.186708 954.572371 977.620229 981.419347 962.751969 963.676599 967.966257 974.68403 955.342462 957.832817 984.065968 972.364891 977.489316 957.352265 969.463156 974.320994 949.679765 969.081674 963.190554 962.394456 966.84177 975.840044 954.471689 977.764019 968.67597 963.203923 964.752374 965.957151 979.17749 915.412491 975.120789 962.105916 980.343235 957.180492 953.552183 979.783099 967.906392 966.926945 962.962301 965.53471 971.030289 954.21045 961.266889 973.367193 956.736464 980.317352 911.188865 979.274233 980.267316 982.029926 977.731543 937.327052 978.161778 978.575841 962.661776 914.896072 966.902901 973.539272 980.418576 966.073472 963.196341 962.718863 977.062467 958.303102 959.937588 959.52382 934.876632 971.899844 979.71 964.154208 960.051284 REL_16_STABLE 07494a0df9a66219e5f1029de47ecabc34c9cb55 pgbench -s8 -c6 -T20 -j6 952.061203 905.964458 921.009294 921.970342 924.810464 935.988344 917.110599 925.62075 933.423024 923.445651 932.740532 927.994569 913.773152 922.955946 917.680486 923.145074 925.133017 922.36253 907.656249 927.980182 924.249294 933.355461 923.359649 919.694726 923.178731 929.250348 921.643735 916.546247 930.960814 913.333819 773.157522 945.293028 924.839608 932.228796 912.846096 924.01411 924.341422 909.823505 882.105606 920.337305 930.297982 909.238148 880.839643 939.582115 927.263785 927.921499 932.897521 931.77316 943.261293 853.433421 921.813303 916.463003 919.652647 914.662188 912.137913 923.279822 922.967526 936.344334 946.281347 801.718759 950.571673 928.845848 888.181388 885.603875 939.763546 896.841216 934.904546 929.369005 884.065433 874.953048 933.411683 930.654935 952.833611 942.193108 930.491705 93