Re: Regression in COPY FROM caused by 9f8377f7a2
On Mon, 2023-09-25 at 17:49 -0400, Tom Lane wrote: > Andrew Dunstan writes: > > On 2023-09-25 Mo 11:06, Andrew Dunstan wrote: > > > On 2023-09-25 Mo 04:59, Laurenz Albe wrote: > > > > CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); > > > Thinking about this a little more, wouldn't it be better if we checked > > at the time we set the default that the value is actually valid for the > > given column? This is only one manifestation of a problem you could run > > into given this table definition. > > I dunno, it seems at least possible that someone would do this > deliberately as a means of preventing the column from being defaulted. > In any case, the current behavior has stood for a very long time and > no one has complained that an error should be thrown sooner. Moreover, this makes restoring a pg_dump from v15 to v16 fail, which should never happen. This is how I got that bug report. Yours, Laurenz Albe
Re: Incorrect handling of OOM in WAL replay leading to data loss
On Thu, Aug 10, 2023 at 02:59:07PM +0900, Michael Paquier wrote: > My apologies if I sounded unclear here. It seems to me that we should > wrap the patch on [1] first, and get it backpatched. At least that > makes for less conflicts when 0001 gets merged for HEAD when we are > able to set a proper error code. (Was looking at it, actually.) Now that Thomas Munro has addressed the original problem to be able to trust correctly xl_tot_len with bae868caf22, I am coming back to this thread. First, attached is a rebased set: - 0001 to introduce the new error infra for xlogreader.c with an error code, so as callers can make the difference between an OOM and an invalid record. - 0002 to tweak the startup process. Once again, I've taken the approach to make the startup process behave like a standby on crash recovery: each time that an OOM is found, we loop and retry. - 0003 to emulate an OOM failure, that can be used with the script attached to see that we don't stop recovery too early. >>> I guess that it depends on how much responsiveness one may want. >>> Forcing a failure on OOM is at least something that users would be >>> immediately able to act on when we don't run a standby but just >>> recover from a crash, while a standby would do what it is designed to >>> do, aka continue to replay what it can see. One issue with the >>> wait-and-continue is that a standby may loop continuously on OOM, >>> which could be also bad if there's a replication slot retaining WAL on >>> the primary. Perhaps that's just OK to keep doing that for a >>> standby. At least this makes the discussion easier for the sake of >>> this thread: just consider the case of crash recovery when we don't >>> have a standby. >> >> Yeah, I'm with you on focusing on crash recovery cases; that's what I >> intended. Sorry for any confusion. > > Okay, so we're on the same page here, keeping standbys as they are and > do something for the crash recovery case. For the crash recovery case, one argument that stood out in my mind is that causing a hard failure has the disadvantage to force users to do again WAL replay from the last redo position, which may be far away even if the checkpointer now runs during crash recovery. What I am proposing on this thread has the merit to avoid that. Anyway, let's discuss more before settling this point for the crash recovery case. By the way, anything that I am proposing here cannot be backpatched because of the infrastructure changes required in walreader.c, so I am going to create a second thread with something that could be backpatched (yeah, likely FATALs on OOM to stop recovery from doing something bad).. -- Michael From 3d7bb24fc2f9f070273b63208819c4e54e428d18 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 26 Sep 2023 15:40:05 +0900 Subject: [PATCH v4 1/3] Add infrastructure to report error codes in WAL reader This commits moves the error state coming from WAL readers into a new structure, that includes the existing pointer to the error message buffer, but it also gains an error code that fed back to the callers of the following routines: XLogPrefetcherReadRecord() XLogReadRecord() XLogNextRecord() DecodeXLogRecord() This will help in improving the decisions to take during recovery depending on the failure more reported. --- src/include/access/xlogprefetcher.h | 2 +- src/include/access/xlogreader.h | 33 +++- src/backend/access/transam/twophase.c | 8 +- src/backend/access/transam/xlog.c | 6 +- src/backend/access/transam/xlogprefetcher.c | 4 +- src/backend/access/transam/xlogreader.c | 170 -- src/backend/access/transam/xlogrecovery.c | 14 +- src/backend/access/transam/xlogutils.c| 2 +- src/backend/replication/logical/logical.c | 9 +- .../replication/logical/logicalfuncs.c| 9 +- src/backend/replication/slotfuncs.c | 8 +- src/backend/replication/walsender.c | 8 +- src/bin/pg_rewind/parsexlog.c | 24 +-- src/bin/pg_waldump/pg_waldump.c | 10 +- contrib/pg_walinspect/pg_walinspect.c | 11 +- src/tools/pgindent/typedefs.list | 2 + 16 files changed, 201 insertions(+), 119 deletions(-) diff --git a/src/include/access/xlogprefetcher.h b/src/include/access/xlogprefetcher.h index 7dd7f20ad0..5563ad1a67 100644 --- a/src/include/access/xlogprefetcher.h +++ b/src/include/access/xlogprefetcher.h @@ -48,7 +48,7 @@ extern void XLogPrefetcherBeginRead(XLogPrefetcher *prefetcher, XLogRecPtr recPtr); extern XLogRecord *XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher, - char **errmsg); + XLogReaderError *errordata); extern void XLogPrefetcherComputeStats(XLogPrefetcher *prefetcher); diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index da32c7db77..06664dc6fb 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogre
Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
> On 26 Sep 2023, at 00:20, Nathan Bossart wrote: > > On Thu, Sep 21, 2023 at 11:18:00AM +0900, bt23nguyent wrote: >> -basic_archive_configured(ArchiveModuleState *state) >> +basic_archive_configured(ArchiveModuleState *state, char **logdetail) > > Could we do something more like GUC_check_errdetail() instead to maintain > backward compatibility with v16? We'd still need something exported to call into which isn't in 16, so it wouldn't be more than optically backwards compatible since a module written for 17 won't compile for 16, or am I missing something? -- Daniel Gustafsson
Re: [PGdocs] fix description for handling pf non-ASCII characters
Hello Hayato and Jian, On Tue, 4 Jul 2023 01:30:56 + "Hayato Kuroda (Fujitsu)" wrote: > Dear Jian, > > looks fine. Do you need to add to commitfest? > > Thank you for your confirmation. ! I registered to following: > > https://commitfest.postgresql.org/44/4437/ The way the Postgres commitfest process works is that someone has to update the page to mark "reviewed" and the reviewer has to use the commitfest website to pass the patches to a committer. I see a few problems with the English and style of the patches and am commenting below and have signed up as a reviewer. At commitfest.postgresql.org I have marked the thread as needing author attention. Hayato, you will need to mark the thread as needing review when you have replied to this message. Jian, you might want to sign on as a reviewer as well. It can be nice to have that record of your participation. Now, to reviewing the patch: First, it is now best practice in the PG docs to put a line break at the end of each sentence. At least for the sentences on the lines you change. (No need to update the whole document!) Please do this in the next version of your patch. I don't know if this is a requirement for acceptance by a committer, but it won't hurt. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e700782d3c..a4ce99ba4d 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7040,9 +7040,8 @@ local0.*/var/log/postgresql The name will be displayed in the pg_stat_activity view and included in CSV log entries. It can also be included in regular log entries via the parameter. -Only printable ASCII characters may be used in the -application_name value. Other characters will be -replaced with question marks (?). +Non-ASCII characters used in the application_name +will be replaced with hexadecimal strings. Don't use the future tense to describe the system's behavior. Instead of "will be" write "are". (Yes, this change would be an improvement on the original text. We should fix it while we're working on it and our attention is focused.) It is more accurate, if I understand the issue, to say that characters are replaced with hexadecimal _representations_ of the input bytes. Finally, it would be good to know what representation we're talking about. Perhaps just give the \xhh example and say: the Postgres C-style escaped hexadecimal byte value. And hyperlink to https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE (The docbook would be, depending on text you want to link: C-style escaped hexadecimal byte value. I think. You link to id="someidvalue" attribute values.) @@ -8037,10 +8036,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; The name can be any string of less than NAMEDATALEN characters (64 characters in a standard -build). Only printable ASCII characters may be used in the -cluster_name value. Other characters will be -replaced with question marks (?). No name is shown -if this parameter is set to the empty string '' (which is +build). Non-ASCII characters used in the cluster_name +will be replaced with hexadecimal strings. No name is shown if this +parameter is set to the empty string '' (which is the default). This parameter can only be set at server start. Same review as for the first patch hunk. diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 5062d712e7..98785e87ea 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -1067,9 +1067,8 @@ postgres=# SELECT postgres_fdw_disconnect_all(); of any length and contain even non-ASCII characters. However when it's passed to and used as application_name in a foreign server, note that it will be truncated to less than - NAMEDATALEN characters and anything other than - printable ASCII characters will be replaced with question - marks (?). + NAMEDATALEN characters and non-ASCII characters will be + replaced with hexadecimal strings. See for details. Same review as for the first patch hunk. Since the both of you have looked and confirmed that the actual behavior matches the new documentation I have not done this. But, have either of you checked that we're really talking about replacing everything outside the 7-bit ASCII encodings? My reading of the commit referenced in the first email of this thread says that it's everything outside of the _printable_ ASCII encodings, ASCII values outside of the range 32 to 127, inclusive. Please check. The docs should probably say "printable ASCII", or "non-printable ASCII", depending. I think the meaning of "printable ASCII" is widely enough known to be 32-127. So "printable" is good enough, right? Regards, Karl Free Software: "You don't pay back, you pay fo
Re: POC: GROUP BY optimization
On 20/7/2023 18:46, Tomas Vondra wrote: 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the estimates and only do the reordering if we have high confidence in the estimates. That means we'll miss some optimization opportunities, but it should limit the risk. According to this issue, I see two options: 1. Go through the grouping column list and find the most reliable one. If we have a unique column or column with statistics on the number of distinct values, which is significantly more than ndistincts for other grouping columns, we can place this column as the first in the grouping. It should guarantee the reliability of such a decision, isn't it? 2. If we have extended statistics on distinct values and these statistics cover some set of first columns in the grouping list, we can optimize these positions. It also looks reliable. Any thoughts? -- regards, Andrey Lepikhov Postgres Professional
RE: pg_upgrade and logical replication
Dear Michael, > > 1. Your approach must be back-patched to older versions which support > > logical > >replication feature, but the oldest one (PG10) has already been > unsupported. > >We should not modify such a branch. > > This suggestion would be only for HEAD as it changes the behavior of -b. > > > 2. Also, "max_logical_replication_workers = 0" approach would be consistent > >with what we are doing now and for upgrade of publisher patch. > >Please see the previous discussion [1]. > > Yeah, you're right. Consistency would be good across the board, and > we'd need to take care of the old clusters as well, so the GUC > enforcement would be needed as well. It does not strike me that this > extra IsBinaryUpgrade would hurt anyway? Forcing the hand of the > backend has the merit of allowing the removal of the tweak with > max_logical_replication_workers at some point in the future. Hmm, our initial motivation is to suppress registering the launcher, and adding a GUC setting is sufficient for it. Indeed, registering a launcher may be harmful, but it seems not the goal of this thread (changing -b workflow in HEAD is not sufficient alone for the issue). I'm not sure it should be included in patch sets here. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)
On Mon, Sep 25, 2023 at 7:14 PM Ranier Vilela wrote: > > Em seg., 25 de set. de 2023 às 08:23, Ashutosh Bapat > escreveu: >> >> On Sat, Sep 23, 2023 at 7:29 PM Ranier Vilela wrote: >> > >> > Hi, >> > >> > Per Coverity. >> > CID 1518088 (#2 of 2): Improper use of negative value (NEGATIVE_RETURNS) >> > >> > The function bms_singleton_member can returns a negative number. >> > >> > /* >> > * Get a child rel for rel2 with the relids. See above comments. >> > */ >> > if (rel2_is_simple) >> > { >> > int varno = bms_singleton_member(child_relids2); >> > >> > child_rel2 = find_base_rel(root, varno); >> > } >> > >> > It turns out that in the get_matching_part_pairs function (joinrels.c), >> > the return of bms_singleton_member is passed to the find_base_rel >> > function, which cannot receive a negative value. >> > >> > find_base_rel is protected by an Assertion, which effectively indicates >> > that the error does not occur in tests and in DEBUG mode. >> > >> > But this does not change the fact that bms_singleton_member can return a >> > negative value, which may occur on some production servers. >> > >> > Fix by changing the Assertion into a real test, to protect the >> > simple_rel_array array. >> >> Do you have a scenario where bms_singleton_member() actually returns a >> -ve number OR it's just a possibility. > > Just a possibility. > >> >> bms_make_singleton() has an >> assertion at the end: Assert(result >= 0); >> bms_make_singleton(), bms_add_member() explicitly forbids negative >> values. It looks like we have covered all the places which can add a >> negative value to a bitmapset. May be we are missing a place or two. >> It will be good to investigate it. > > I try to do the research, mostly, with runtime compilation. > As previously stated, the error does not appear in the tests. > That said, although Assert protects in most cases, that doesn't mean it can't > occur in a query, running on a server in production mode. > > Now thinking about what you said about Assertion in bms_make_singleton. > I think it's nonsense, no? Sorry, I didn't write it correctly. bms_make_singleton() doesn't accept a negative integer and bms_get_singleton_member() and bms_singleton_member() has assertion at the end. Since there is no possibility of a negative integer making itself a part of bitmapset, the two functions Asserting instead of elog'ing is better. Assert are cheaper in production. > Why design a function that in DEBUG mode prohibits negative returns, but in > runtime mode allows it? > After all, why allow a negative return, if for all practical purposes this is > prohibited? You haven't given any proof that there's a possibility that a negative value may be returned. We are not allowing negative value being returned at all. > Regarding the find_base_rel function, it is nonsense to protect the array > with Assertion. > After all, we have already protected the upper limit with a real test, why > not also protect the lower limit. > The additional testing is cheap and makes perfect sense, making the function > more robust in production mode. > As an added bonus, modern compilers will probably be able to remove the > additional test if it deems it not necessary. > Furthermore, all protections that were added to protect find_base_real calls > can eventually be removed, > since find_base_real will accept parameters with negative values. However, I agree that changing find_base_rel() the way you have done in your patch is fine and mildly future-proof. +1 to that idea irrespective of what bitmapset functions do. -- Best Wishes, Ashutosh Bapat
Re: pg_upgrade and logical replication
On Wed, 20 Sept 2023 at 16:54, Amit Kapila wrote: > > On Fri, Sep 15, 2023 at 3:08 PM vignesh C wrote: > > > > The attached v8 version patch has the changes for the same. > > > > Is the check to ensure remote_lsn is valid correct in function > check_for_subscription_state()? How about the case where the apply > worker didn't receive any change but just marked the relation as > 'ready'? I agree that remote_lsn will not be valid in the case when all the tables are in ready state and there are no changes to be sent by the walsender to the worker. I was not sure if this check is required in this case in the check_for_subscription_state function. I was thinking that this check could be removed. I'm also checking why the tables should only be in ready state, the check that is there in the same function, can we support upgrades when the tables are in syncdone state or not. I will post my analysis once I have finished checking on the same. Regards, Vignesh
Re: Remove MSVC scripts from the tree
On Fri, 22 Sep 2023 10:12:29 +0900 Michael Paquier wrote: > As of today, I can see that the only buildfarm members relying on > these scripts are bowerbird and hamerkop, so these two would fail if > the patch attached were to be applied today. I am adding Andrew > D. and the maintainers of hamerkop (SRA-OSS) in CC for comments. hamerkop is not yet prepared for Meson builds, but we plan to work on this support soon. If we go with Meson builds exclusively right now, we will have to temporarily remove the master/HEAD for a while. Best Regards. -- SRA OSS LLC Chen Ningwei
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Bharath, Again, thank you for reviewing! PSA a new version. > > Here are some more comments/thoughts on the v44 patch: > > 1. > +# pg_upgrade will fail because the slot still has unconsumed WAL records > +command_fails( > +[ > > Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n", > file as well? Added. The test was not added because 002_pg_upgrade.pl did not do similar checks, but it is worth verifying. One difficulty was that output directory had millisecond timestamp, so the absolute path could not be predicted. So File::Find::find was used to detect the file. > 2. > +'run of pg_upgrade where the new cluster has insufficient > max_replication_slots'); > +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", > +"pg_upgrade_output.d/ not removed after pg_upgrade failure"); > > +'run of pg_upgrade where the new cluster has the wrong wal_level'); > +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", > +"pg_upgrade_output.d/ not removed after pg_upgrade failure"); > > +'run of pg_upgrade of old cluster with idle replication slots'); > +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", > +"pg_upgrade_output.d/ not removed after pg_upgrade failure"); > > How do these tests recognize the failures are the intended ones? I > mean, for instance when pg_upgrade fails for unused replication > slots/unconsumed WAL records, then just looking at the presence of > pg_upgrade_output.d might not be sufficient, no? Using > command_fails_like instead of command_fails and looking at the > contents of invalid_logical_relication_slots.txt might help make these > tests more focused. Yeah, currently the output was not checked. I checked and found that pg_upgrade would output all messages (including error message) to stdout, so command_fails_like() could not be used. Therefore, command_checks_all() was used instead. > 3. > +pg_log(PG_REPORT, "fatal"); > +pg_fatal("Your installation contains invalid logical > replication slots.\n" > + "These slots can't be copied, so this cluster cannot > be upgraded.\n" > + "Consider removing such slots or consuming the > pending WAL if any,\n" > + "and then restart the upgrade.\n" > + "A list of invalid logical replication slots is in > the file:\n" > + "%s", output_path); > > It's not just the invalid logical replication slots, but also the > slots with unconsumed WALs which aren't invalid and can be upgraded if > ensured the WAL is consumed. So, a better wording would be: > pg_fatal("Your installation contains logical replication slots > that cannot be upgraded.\n" > "List of all such logical replication slots is in the > file:\n" > "These slots can't be copied, so this cluster cannot > be upgraded.\n" > "Consider removing invalid slots and/or consuming the > pending WAL if any,\n" > "and then restart the upgrade.\n" > "%s", output_path); Fixed. Also, I ran pgperltidy. Some formattings were changed. Best Regards, Hayato Kuroda FUJITSU LIMITED v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch Description: v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Re: Doc: vcregress .bat commands list lacks "taptest"
On Tue, 26 Sep 2023 08:18:01 +0900 Michael Paquier wrote: > On Mon, Sep 25, 2023 at 11:07:57AM -0400, Andrew Dunstan wrote: > > +1 > > Thanks, applied and backpatched then. Thank you! Regards, Yugo Nagata > -- > Michael -- Yugo NAGATA
Re: Remove distprep
On Wed, Aug 23, 2023 at 12:46:45PM +0200, Peter Eisentraut wrote: > Apparently, the headerscheck and cpluspluscheck targets still didn't work > right in the Cirrus jobs. Here is an updated patch to address that. This > is also rebased over some recent changes that affected this patch (generated > wait events stuff). -gettext-files: distprep +gettext-files: postgres This bit in src/backend/nls.mk does not seem right to me. The following sequence works on HEAD, but breaks with the patch because the files that should be automatically generated from the perl scripts aren't anymore: $ ./configure ... $ make -C src/backend/ gettext-files [...] In file included from ../../../../src/include/postgres.h:46, from brin.c:16: ../../../../src/include/utils/elog.h:79:10: fatal error: utils/errcodes.h: No such file or directory 79 | #include "utils/errcodes.h" # Technically, this should depend on Makefile.global, but then # version.sgml would need to be rebuilt after every configure run, # even in distribution tarballs. So this is cheating a bit, but it There is this comment in doc/src/sgml/Makefile. Does it still apply? Note that building PostgreSQL from the source repository requires reasonably up-to-date versions of bison, flex, and Perl. These tools are not needed to build from a distribution tarball, because the files generated with these tools are included in the tarball. Other tool requirements This paragraph exists in sourcerepo.sgml, but it should be updated, I guess, because now these three binaries would be required when building from a tarball. # specparse.c and specscanner.c are in the distribution tarball, # so do not clean them here This comment in src/test/isolation/Makefile should be removed. -- Michael signature.asc Description: PGP signature
RE: [PoC] pg_upgrade: allow to upgrade publisher node
On Monday, September 25, 2023 7:01 PM Kuroda, Hayato/黒田 隼人 wrote: > To: 'Bharath Rupireddy' > Cc: Amit Kapila ; Dilip Kumar > > > > 5. In continuation to the above comment: > > > > Why can't this logic be something like - if there's any WAL record > > seen after a slot's confirmed flush LSN is of type generated by WAL > > resource manager having the rm_decode function defined, then the slot > > can't be upgraded. > > Thank you for giving new approach! We have never seen the approach before, > but at least XLOG and HEAP2 rmgr have a decode function. So that > XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and > XLOG_HEAP2_PRUNE cannot be ignored the approach, seems not appropriate. > If you have another approach, I'm very happy if you post. Another idea around decoding is to check if there is any decoding output for the WAL records. Like we can create a temp slot and use test_decoding to decode the WAL from the confirmed_flush_lsn among existing logical replication slots. And if there is any output from the output plugin, then we consider WAL has not been consumed yet. But this means we need to ignore some of the WALs like XLOG_XACT_INVALIDATIONS which won't be decoded into the output. Also, this approach could be costly as it needs to do the extra decoding and output, and we need to assume that "all the WAL records including custom records will be decoded and output if they need to be consumed" . So it may not be better, but just share it for reference. Best Regards, Hou zj
Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector
Forgot to attach. Sorry. On Mon, 25 Sep 2023 23:30:38 -0500 "Karl O. Pinc" wrote: > Version 3. > > Re-do title, which is all of patch v3-003. > > On Mon, 25 Sep 2023 17:55:59 -0500 > "Karl O. Pinc" wrote: > > > On Mon, 25 Sep 2023 14:14:37 +0200 > > Daniel Gustafsson wrote: > > > > Once done you can do "git format-patch origin/master -v 1" which > > > will generate a set of n patches named v1-0001 through v1-000n. > > > Done. 11 patches attached. Thanks for the help. > > > I am not particularly confident in the top-line commit > > descriptions. > > > The bulk of the commit descriptions are very wordy > > > Listing all the attachments here for future discussion: > > v3-0001-Change-section-heading-to-better-reflect-saving-a.patch > v3-0002-Change-section-heading-to-better-describe-referen.patch > v3-0003-Better-section-heading-for-plpgsql-exception-trap.patch > v3-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch > v3-0005-Improve-sentences-in-overview-of-system-configura.patch > v3-0006-Provide-examples-of-listing-all-settings.patch > v3-0007-Cleanup-summary-of-role-powers.patch > v3-0008-Explain-the-difference-between-role-attributes-an.patch > v3-0009-Document-the-oidvector-type.patch > v3-0010-Improve-sentences-about-the-significance-of-the-s.patch > v3-0011-Add-a-sub-section-to-describe-schema-resolution.patch Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein >From 122665c4155698abe88e2bd17639a991791b94e3 Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 24 Sep 2023 15:49:30 -0500 Subject: [PATCH v3 01/11] Change section heading to better reflect saving a result in variable(s) The current section title of "Executing a Command with a Single-Row Result" does not reflect what the section is really about. Other sections make clear how to _execute_ commands, single-row result or not. What this section is about is how to _save_ a single row of results into variable(s). It would be nice to talk about saving results into variables in the section heading but I couldn't come up with anything pithy. "Saving a Single-Row of a Command's Result" seems good enough, especially since there's few other places to save results other than in variables. --- doc/src/sgml/plpgsql.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index f55e901c7e..8747e84245 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1126,7 +1126,7 @@ PERFORM create_mv('cs_session_page_requests_mv', my_query); -Executing a Command with a Single-Row Result +Saving a Single-Row of a Command's Result SELECT INTO -- 2.30.2 >From 4de4a31d41124dfa793cc5cce0516673811ea414 Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 24 Sep 2023 15:52:21 -0500 Subject: [PATCH v3 02/11] Change section heading to better describe reference of existing types The section heading of "Copying Types" does not reflect what the section is about. It is not about making copies of data types but about using the data type of existing columns (or rows) in new type declarations without having to know what the existing type is. "Re-Using the Type of Columns and Variables" seems adequate. Getting something in there about declartions seems too wordy. I thought perhaps "Referencing" instead of "Re-Using", but "referencing" isn't perfect and "re-using" is generic enough, shorter, and simpler to read. --- doc/src/sgml/plpgsql.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 8747e84245..874578265e 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -672,7 +672,7 @@ DECLARE - Copying Types + Re-Using the Type of Columns and Variables variable%TYPE -- 2.30.2 >From 80c2b8ef7ad6e610f5c7bdc61b827983a87110e2 Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 24 Sep 2023 16:03:29 -0500 Subject: [PATCH v3 03/11] Better section heading for plpgsql exception trapping The docs seem to use "error" and "exception" interchangeably, perhaps 50% each. But they never say that the are the same thing, and in the larger world they are not. Errors tend to be something that drop on the floor and usually halt execution whereas exceptions can be trapped and give the programmer more control over the flow of the program. (Although, to be fair, exceptions are a subset of errors.) "Trapping Errors" is not a good section title for these reasons, and because when it comes to programmatically raising errors in Pl/PgSQL you don't, you raise exceptions. The current section heading does not stand out in a scan of the table of contents when you're looking for exception handling, IMHO. "Error Processing and Trapping Exceptions" is a little long but it does accurately reflect that the section is about how Pl/PgSQL behaves under er
Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector
Version 3. Re-do title, which is all of patch v3-003. On Mon, 25 Sep 2023 17:55:59 -0500 "Karl O. Pinc" wrote: > On Mon, 25 Sep 2023 14:14:37 +0200 > Daniel Gustafsson wrote: > > Once done you can do "git format-patch origin/master -v 1" which > > will generate a set of n patches named v1-0001 through v1-000n. > Done. 11 patches attached. Thanks for the help. > I am not particularly confident in the top-line commit > descriptions. > The bulk of the commit descriptions are very wordy > Listing all the attachments here for future discussion: v3-0001-Change-section-heading-to-better-reflect-saving-a.patch v3-0002-Change-section-heading-to-better-describe-referen.patch v3-0003-Better-section-heading-for-plpgsql-exception-trap.patch v3-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch v3-0005-Improve-sentences-in-overview-of-system-configura.patch v3-0006-Provide-examples-of-listing-all-settings.patch v3-0007-Cleanup-summary-of-role-powers.patch v3-0008-Explain-the-difference-between-role-attributes-an.patch v3-0009-Document-the-oidvector-type.patch v3-0010-Improve-sentences-about-the-significance-of-the-s.patch v3-0011-Add-a-sub-section-to-describe-schema-resolution.patch Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
How are jsonb_path_query SRFs $.datetime() defined ?
Hi, In the regression test I see tests the one below [1], but generally, jsonb_path_query is a SRF, and the definition in pg_proc.dat has it as such [0], looking at the implementation it doesn’t look like it calls jsonb_query_first behind the scenes (say if it detects it’s being called in SELECT), which would explain it. How would I re-create this same behaviour with say another $.func() in jsonb or any SRF in general. { oid => '4006', descr => 'jsonpath query', proname => 'jsonb_path_query', prorows => '1000', proretset => 't', prorettype => 'jsonb', proargtypes => 'jsonb jsonpath jsonb bool', prosrc => 'jsonb_path_query' }, select jsonb_path_query('"10-03-2017"', '$.datetime("dd-mm-")'); jsonb_path_query -- "2017-03-10" (1 row)
Re: [HACKERS] Should logtape.c blocks be of type long?
On Sun, Sep 24, 2023 at 10:42:49AM +0900, Michael Paquier wrote: > Indeed, or Windows decides that making long 8-byte is wiser, but I > doubt that's ever going to happen on backward-compatibility ground. While looking more at that, I've noticed that I missed BufFileAppend() and BufFileSeekBlock(), that themselves rely on long. The other code paths calling these two routines rely on BlockNumber (aka uint32), so that seems to be the bottom of it. For now, I have registered this patch to the next CF: https://commitfest.postgresql.org/45/4589/ Comments are welcome. -- Michael From 331b2433bcf11f4e028002f52a64f6af91266b88 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 26 Sep 2023 13:07:56 +0900 Subject: [PATCH] Change logtape/tuplestore code to use int64 for block numbers The code previously relied on long to track block numbers, which would be 4 bytes in all Windows builds or any 32-bit builds. This limited the code to be able to handle up to 16TB of data with the default block size, like CLUSTER. Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=jo_r9guzv...@mail.gmail.com --- src/include/storage/buffile.h | 4 +- src/include/utils/logtape.h| 8 +- src/backend/storage/file/buffile.c | 10 +-- src/backend/utils/sort/logtape.c | 126 ++--- src/backend/utils/sort/tuplesort.c | 12 +-- 5 files changed, 80 insertions(+), 80 deletions(-) diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h index 6583766719..cbffc9c77e 100644 --- a/src/include/storage/buffile.h +++ b/src/include/storage/buffile.h @@ -44,9 +44,9 @@ extern size_t BufFileReadMaybeEOF(BufFile *file, void *ptr, size_t size, bool eo extern void BufFileWrite(BufFile *file, const void *ptr, size_t size); extern int BufFileSeek(BufFile *file, int fileno, off_t offset, int whence); extern void BufFileTell(BufFile *file, int *fileno, off_t *offset); -extern int BufFileSeekBlock(BufFile *file, long blknum); +extern int BufFileSeekBlock(BufFile *file, int64 blknum); extern int64 BufFileSize(BufFile *file); -extern long BufFileAppend(BufFile *target, BufFile *source); +extern int64 BufFileAppend(BufFile *target, BufFile *source); extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name); extern void BufFileExportFileSet(BufFile *file); diff --git a/src/include/utils/logtape.h b/src/include/utils/logtape.h index 5420a24ac9..2de7add81c 100644 --- a/src/include/utils/logtape.h +++ b/src/include/utils/logtape.h @@ -51,7 +51,7 @@ typedef struct TapeShare * Currently, all the leader process needs is the location of the * materialized tape's first block. */ - long firstblocknumber; + int64 firstblocknumber; } TapeShare; /* @@ -70,8 +70,8 @@ extern void LogicalTapeWrite(LogicalTape *lt, const void *ptr, size_t size); extern void LogicalTapeRewindForRead(LogicalTape *lt, size_t buffer_size); extern void LogicalTapeFreeze(LogicalTape *lt, TapeShare *share); extern size_t LogicalTapeBackspace(LogicalTape *lt, size_t size); -extern void LogicalTapeSeek(LogicalTape *lt, long blocknum, int offset); -extern void LogicalTapeTell(LogicalTape *lt, long *blocknum, int *offset); -extern long LogicalTapeSetBlocks(LogicalTapeSet *lts); +extern void LogicalTapeSeek(LogicalTape *lt, int64 blocknum, int offset); +extern void LogicalTapeTell(LogicalTape *lt, int64 *blocknum, int *offset); +extern int64 LogicalTapeSetBlocks(LogicalTapeSet *lts); #endif /* LOGTAPE_H */ diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 41ab64100e..919e1106d6 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -841,14 +841,14 @@ BufFileTell(BufFile *file, int *fileno, off_t *offset) * * Performs absolute seek to the start of the n'th BLCKSZ-sized block of * the file. Note that users of this interface will fail if their files - * exceed BLCKSZ * LONG_MAX bytes, but that is quite a lot; we don't work - * with tables bigger than that, either... + * exceed BLCKSZ * PG_INT64_MAX bytes, but that is quite a lot; we don't + * work with tables bigger than that, either... * * Result is 0 if OK, EOF if not. Logical position is not moved if an * impossible seek is attempted. */ int -BufFileSeekBlock(BufFile *file, long blknum) +BufFileSeekBlock(BufFile *file, int64 blknum) { return BufFileSeek(file, (int) (blknum / BUFFILE_SEG_SIZE), @@ -919,10 +919,10 @@ BufFileSize(BufFile *file) * begins. Caller should apply this as an offset when working off block * positions that are in terms of the original BufFile space. */ -long +int64 BufFileAppend(BufFile *target, BufFile *source) { - long startBlock = target->numFiles * BUFFILE_SEG_SIZE; + int64 startBlock = target->numFiles * BUFFILE_SEG_SIZE; int newNumFiles = target->numFiles + source->numFiles; int i; diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index f
Re: pg_upgrade and logical replication
On Mon, Sep 25, 2023 at 11:43 AM Michael Paquier wrote: > > On Mon, Sep 25, 2023 at 10:05:41AM +0530, Amit Kapila wrote: > > I also don't think that this patch has to solve the problem of > > publishers in any way but as per my understanding, if due to some > > reason we are not able to do the upgrade of publishers, this can add > > more steps for users than they have to do now for logical replication > > set up after upgrade. This is because now after restoring the > > subscription rel's and origin, as soon as we start replication after > > creating the slots on the publisher, we will never be able to > > guarantee data consistency. So, they need to drop the entire > > subscription setup including truncating the relations, and then set it > > up from scratch which also means they need to somehow remember or take > > a dump of the current subscription setup. According to me, the key > > point is to have a mechanism to set up slots correctly to allow > > replication (or subscriptions) to work after the upgrade. Without > > that, it appears to me that we are restoring a subscription where it > > can start from some random LSN and can easily lead to data consistency > > issues where it can miss some of the updates. > > Sure, that's assuming that the publisher side is upgraded. > At some point, user needs to upgrade publisher and subscriber could itself have some publications defined which means the downstream subscribers will have the same problem. > FWIW, my > take is that there's room to move forward with this patch anyway in > favor of cases like rollover upgrades to the subscriber. > > > This is the primary reason why I prioritized to work on the publisher > > side before getting this patch done, otherwise, the solution for this > > patch was relatively clear. I am not sure but I guess this could be > > the reason why originally we left it in the current state, otherwise, > > restoring subscription rel's or origin doesn't seem to be too much of > > an additional effort than what we are doing now. > > By "additional effort", you are referring to what the patch is doing, > with the binary dump of pg_subscription_rel, right? > Yes. -- With Regards, Amit Kapila.
Could not run generate_unaccent_rules.py script when update unicode
Hi, hackers When I try to update unicode mapping tables using make update-unicode [1], I encountered an error about following: generate_unaccent_rules.py --unicode-data-file ../../src/common/unicode/UnicodeData.txt --latin-ascii-file Latin-ASCII.xml >unaccent.rules /bin/sh: 1: generate_unaccent_rules.py: not found make: *** [Makefile:33: unaccent.rules] Error 127 make: *** Deleting file 'unaccent.rules' The generate_unaccent_rules.py is in contrib/unaccent and the Makefile: # Allow running this even without --with-python PYTHON ?= python $(srcdir)/unaccent.rules: generate_unaccent_rules.py ../../src/common/unicode/UnicodeData.txt Latin-ASCII.xml $(PYTHON) $< --unicode-data-file $(word 2,$^) --latin-ascii-file $(word 3,$^) >$@ It use python to run generate_unaccent_rules.py, However, the ?= operator in Makefile only check variable is defined or not, but do not check variable is empty. Since the PYTHON is defined in src/Makefile.global, so here PYTHON get empty when without --with-ptyhon. Here are some examples: japin@coltd-devel:~$ cat Makefile PYTHON = PYTHON ?= python test: echo '$(PYTHON)' japin@coltd-devel:~$ make echo '' japin@coltd-devel:~$ cat Makefile PYTHON = python3 PYTHON ?= python test: echo '$(PYTHON)' japin@coltd-devel:~$ make echo 'python3' python3 japin@coltd-devel:~$ cat Makefile PYTHON = ifeq ($(PYTHON),) PYTHON = python endif test: echo '$(PYTHON)' japin@coltd-devel:~$ make echo 'python' python japin@coltd-devel:~$ cat Makefile PYTHON = python3 ifeq ($(PYTHON),) PYTHON = python endif test: echo '$(PYTHON)' japin@coltd-devel:~$ make echo 'python3' python3 Here is a patch to fix this, any thoughts? diff --git a/contrib/unaccent/Makefile b/contrib/unaccent/Makefile index 652a3e774c..3ff49ba1e9 100644 --- a/contrib/unaccent/Makefile +++ b/contrib/unaccent/Makefile @@ -26,7 +26,9 @@ endif update-unicode: $(srcdir)/unaccent.rules # Allow running this even without --with-python -PYTHON ?= python +ifeq ($(PYTHON),) +PYTHON = python +endif $(srcdir)/unaccent.rules: generate_unaccent_rules.py ../../src/common/unicode/UnicodeData.txt Latin-ASCII.xml $(PYTHON) $< --unicode-data-file $(word 2,$^) --latin-ascii-file $(word 3,$^) >$@ [1] https://www.postgresql.org/message-id/MEYP282MB1669AC78EE8374B3DE797A09B6FCA%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: Fix a wrong comment in setrefs.c
On Tue, Sep 26, 2023 at 5:45 AM Tom Lane wrote: > Hmm. This kind of makes me itch, because in principle a ressortgroupref > identifier should uniquely identify a sorting/grouping column. If it > fails to do so here, maybe there are outright bugs lurking elsewhere? > > I poked into it a little and determined that the problematic > ressortgroupref values are being assigned during prepunion.c, > which says > > * By convention, all non-resjunk columns in a setop tree have > * ressortgroupref equal to their resno. In some cases the ref > isn't > * needed, but this is a cleaner way than modifying the tlist > later. > > So at the end of that, we can have Vars in the upper relations' > targetlists that are associated by ressortgroupref with values > in the setop input relations' tlists, but don't match. > (You are right that added-on implicit coercions are one reason for > the expressions to be different, but it's not the only one.) Ah. Thanks for the investigation. > I'm inclined to write the comment more like "Usually the equal() > check is redundant, but in setop plans it may not be, since > prepunion.c assigns ressortgroupref equal to the column resno > without regard to whether that matches the topmost level's > sortgrouprefs and without regard to whether any implicit coercions > are added in the setop tree. We might have to clean that up someday; > but for now, just ignore any false matches." +1. It explains the situation much more clearly and accurately. Thanks Richard
Re: How to update unicode mapping table?
On Tue, 26 Sep 2023 at 06:20, Tom Lane wrote: > Peter Eisentraut writes: >> On 25.09.23 08:02, Japin Li wrote: >>> When I try to update the unicode mapping table through *.xml in >>> src/backend/utils/mb/Unicode/, it doesn't update the *.map files. >>> I find the make cannot go to this directory, what can I do to update >>> the mapping tables? > >> This is done by "make update-unicode". > > On a slightly related note, I noticed while preparing 3aff1d3fd > that src/backend/utils/mb/Unicode/Makefile seems a little screwy. > > So there doesn't seem to be any clean way to regenerate the fileset > present in git. Maybe these targets aren't supposed to be invoked > here, but then why have a Makefile here at all? Alternatively, > maybe we have files in git that shouldn't be there (very likely due > to the fact that this directory also lacks a .gitignore file). > I find those files do not removed when using VPATH build. -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: How to update unicode mapping table?
On Tue, 26 Sep 2023 at 05:58, Peter Eisentraut wrote: > On 25.09.23 08:02, Japin Li wrote: >> When I try to update the unicode mapping table through *.xml in >> src/backend/utils/mb/Unicode/, it doesn't update the *.map files. >> I find the make cannot go to this directory, what can I do to update >> the mapping tables? > > This is done by "make update-unicode". Thanks, it seems works. -- Regrads, Japin Li ChengDu WenWu Information Technology Co., Ltd.
Re: bug fix and documentation improvement about vacuumdb
> > I've applied this down to v16 now, thanks for the submission! > Thanks for pushing! Masaki Kuwamura
Re: Add 'worker_type' to pg_stat_subscription
On Tue, Sep 26, 2023 at 7:16 AM Nathan Bossart wrote: > > On Thu, Sep 21, 2023 at 04:01:20PM +0530, Amit Kapila wrote: > > The changes looks good to me, though I haven't tested it. But feel > > free to commit if you are fine with this patch. > > Committed. > Thanks for pushing. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Doesn't pgstat_report_wal() handle the argument "force" incorrectly
On Mon, Sep 25, 2023 at 02:49:50PM +0900, Ryoga Yoshida wrote: > On 2023-09-25 14:38, Michael Paquier wrote: >> Another idea would be to do like in pgstat.c by adding the following >> line, then use "nowait" to call each sub-function: >> nowait = !force; >> pgstat_flush_wal(nowait); >> pgstat_flush_io(nowait); > > That's very clear and I think it's good. Done this way down to 15, then, with more comment polishing. -- Michael signature.asc Description: PGP signature
Re: Doc: vcregress .bat commands list lacks "taptest"
On Mon, Sep 25, 2023 at 11:07:57AM -0400, Andrew Dunstan wrote: > +1 Thanks, applied and backpatched then. -- Michael signature.asc Description: PGP signature
Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector
On Mon, 25 Sep 2023 14:14:37 +0200 Daniel Gustafsson wrote: > > On 25 Sep 2023, at 14:00, Karl O. Pinc wrote: > > > Is there a preferred data format or should I send > > each patch in a separate attachment with description? > Once done you can do "git format-patch origin/master -v 1" which will > generate a set of n patches named v1-0001 through v1-000n. You can > then attache those to the thread. Done. 11 patches attached. Thanks for the help. (This is v2, since I made some changes upon review.) I am not particularly confident in the top-line commit descriptions. Some seem kind of long and not a whole lot of thought went into them. But the commit descriptions are for the committer to decide anyway. The bulk of the commit descriptions are very wordy and will surely need at least some editing. Listing all the attachments here for future discussion: v2-0001-Change-section-heading-to-better-reflect-saving-a.patch v2-0002-Change-section-heading-to-better-describe-referen.patch v2-0003-Better-section-heading-for-plpgsql-exception-trap.patch v2-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch v2-0005-Improve-sentences-in-overview-of-system-configura.patch v2-0006-Provide-examples-of-listing-all-settings.patch v2-0007-Cleanup-summary-of-role-powers.patch v2-0008-Explain-the-difference-between-role-attributes-an.patch v2-0009-Document-the-oidvector-type.patch v2-0010-Improve-sentences-about-the-significance-of-the-s.patch v2-0011-Add-a-sub-section-to-describe-schema-resolution.patch Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein >From 122665c4155698abe88e2bd17639a991791b94e3 Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 24 Sep 2023 15:49:30 -0500 Subject: [PATCH v2 01/11] Change section heading to better reflect saving a result in variable(s) The current section title of "Executing a Command with a Single-Row Result" does not reflect what the section is really about. Other sections make clear how to _execute_ commands, single-row result or not. What this section is about is how to _save_ a single row of results into variable(s). It would be nice to talk about saving results into variables in the section heading but I couldn't come up with anything pithy. "Saving a Single-Row of a Command's Result" seems good enough, especially since there's few other places to save results other than in variables. --- doc/src/sgml/plpgsql.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index f55e901c7e..8747e84245 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1126,7 +1126,7 @@ PERFORM create_mv('cs_session_page_requests_mv', my_query); -Executing a Command with a Single-Row Result +Saving a Single-Row of a Command's Result SELECT INTO -- 2.30.2 >From 4de4a31d41124dfa793cc5cce0516673811ea414 Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 24 Sep 2023 15:52:21 -0500 Subject: [PATCH v2 02/11] Change section heading to better describe reference of existing types The section heading of "Copying Types" does not reflect what the section is about. It is not about making copies of data types but about using the data type of existing columns (or rows) in new type declarations without having to know what the existing type is. "Re-Using the Type of Columns and Variables" seems adequate. Getting something in there about declartions seems too wordy. I thought perhaps "Referencing" instead of "Re-Using", but "referencing" isn't perfect and "re-using" is generic enough, shorter, and simpler to read. --- doc/src/sgml/plpgsql.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 8747e84245..874578265e 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -672,7 +672,7 @@ DECLARE - Copying Types + Re-Using the Type of Columns and Variables variable%TYPE -- 2.30.2 >From 0252dd434bb9ab2487cd37a93912d19ca1ef5149 Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 24 Sep 2023 16:03:29 -0500 Subject: [PATCH v2 03/11] Better section heading for plpgsql exception trapping The docs seem to use "error" and "exception" interchangeably, perhaps 50% each. But they never say that the are the same thing, and in the larger world they are not. Errors tend to be something that drop on the floor and usually halt execution whereas exceptions can be trapped and give the programmer more control over the flow of the program. "Trapping Errors" is not a good section title for these reasons, and because when it comes to programmatically raising errors in Pl/PgSQL you don't, you raise exceptions. The current section heading does not stand out in a scan of the table of contents when you're looking for exception handling, IMHO. "Error Handling and Exception Trapping" is a lit
Re: Partial aggregates pushdown
On Mon, Sep 25, 2023 at 03:18:13AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > Hi Mr.Bruce, Mr.Pyhalov, Mr.Finnerty, hackers. > > Thank you for your valuable comments. I sincerely apologize for the very late > reply. > Here is a response to your comments or a fix to the patch. > > Tuesday, August 8, 2023 at 3:31 Bruce Momjian > > > I have modified the program except for the point "if the version of the > > > remote server is less than PG17". > > > Instead, we have addressed the following. > > > "If check_partial_aggregate_support is true and the remote server version > > > is older than the local server > > > version, postgres_fdw does not assume that the partial aggregate function > > > is on the remote server unless > > > the partial aggregate function and the aggregate function match." > > > The reason for this is to maintain compatibility with any aggregate > > > function that does not support partial > > > aggregate in one version of V1 (V1 is PG17 or higher), even if the next > > > version supports partial aggregate. > > > For example, string_agg does not support partial aggregation in PG15, but > > > it will support partial aggregation > > > in PG16. > > > > Just to clarify, I think you are saying: > > > > If check_partial_aggregate_support is true and the remote server > > version is older than the local server version, postgres_fdw > > checks if the partial aggregate function exists on the remote > > server during planning and only uses it if it does. > > > > I tried to phrase it in a positive way, and mentioned the plan time > > distinction. Also, I am sorry I was away for most of July and am just > > getting to this. > Thanks for your comment. In the documentation, the description of > check_partial_aggregate_support is as follows > (please see postgres-fdw.sgml). > -- > check_partial_aggregate_support (boolean) > Only if this option is true, during query planning, postgres_fdw connects to > the remote server and check if the remote server version is older than the > local server version. If so, postgres_fdw assumes that for each built-in > aggregate function, the partial aggregate function is not defined on the > remote server unless the partial aggregate function and the aggregate > function match. The default is false. > -- My point is that there are three behaviors: * false - no check * true, remote version >= sender - no check * true, remove version < sender - check Here is your code: + * Check that a buit-in aggpartialfunc exists on the remote server. If + * check_partial_aggregate_support is false, we assume the partial aggregate + * function exsits on the remote server. Otherwise we assume the partial + * aggregate function exsits on the remote server only if the remote server + * version is not less than the local server version. + */ +static bool +is_builtin_aggpartialfunc_shippable(Oid aggpartialfn, PgFdwRelationInfo *fpinfo) +{ + boolshippable = true; + + if (fpinfo->check_partial_aggregate_support) + { + if (fpinfo->remoteversion == 0) + { + PGconn *conn = GetConnection(fpinfo->user, false, NULL); + + fpinfo->remoteversion = PQserverVersion(conn); + } + if (fpinfo->remoteversion < PG_VERSION_NUM) + shippable = false; + } + return shippable; +} I think this needs to be explained in the docs. I am ready to adjust the patch to improve the wording whenever you are ready. Should I do it now and post an updated version for you to use? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: ALTER ROLE documentation improvement
On Fri, Sep 15, 2023 at 02:25:38PM -0700, Yurii Rashkovskii wrote: > Thank you for the feedback. I've updated the glossary and updated the > terminology to be consistent. Please see the new patch attached. Thanks for the new version of the patch. This user owns all system catalog tables in each database. It is also the role from which all granted permissions originate. Because of these things, this - role may not be dropped. + role may not be dropped. This role must always be a superuser, it can't be changed + to be a non-superuser. I think we should move this note to the sentence just below that mentions its superuserness. Maybe it could look something like this: This role also behaves as a normal database superuser, and its superuser status cannot be revoked. + Database superusers can change any of these settings for any role, except for + changing SUPERUSER to NOSUPERUSER + for a bootstrap superuser. nitpick: s/a bootstrap superuser/the bootstrap superuser #: commands/user.c:871 #, c-format -msgid "The bootstrap user must have the %s attribute." +msgid "The bootstrap superuser must have the %s attribute." msgstr "Der Bootstrap-Benutzer muss das %s-Attribut haben." No need to update the translation files. Those are updated separately in the pgtranslation repo. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
On Thu, Sep 21, 2023 at 11:18:00AM +0900, bt23nguyent wrote: > -basic_archive_configured(ArchiveModuleState *state) > +basic_archive_configured(ArchiveModuleState *state, char **logdetail) Could we do something more like GUC_check_errdetail() instead to maintain backward compatibility with v16? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: How to update unicode mapping table?
Peter Eisentraut writes: > On 25.09.23 08:02, Japin Li wrote: >> When I try to update the unicode mapping table through *.xml in >> src/backend/utils/mb/Unicode/, it doesn't update the *.map files. >> I find the make cannot go to this directory, what can I do to update >> the mapping tables? > This is done by "make update-unicode". On a slightly related note, I noticed while preparing 3aff1d3fd that src/backend/utils/mb/Unicode/Makefile seems a little screwy. If you go into that directory and type "make distclean", you'll find it removes some files that are in git: [postgres@sss1 Unicode]$ make distclean rm -f 8859-10.TXT 8859-13.TXT 8859-14.TXT 8859-15.TXT 8859-16.TXT 8859-2.TXT 8859-3.TXT 8859-4.TXT 8859-5.TXT 8859-6.TXT 8859-7.TXT 8859-8.TXT 8859-9.TXT BIG5.TXT CNS11643.TXT CP1250.TXT CP1251.TXT CP1252.TXT CP1253.TXT CP1254.TXT CP1255.TXT CP1256.TXT CP1257.TXT CP1258.TXT CP866.TXT CP874.TXT CP932.TXT CP936.TXT CP950.TXT JIS0212.TXT JOHAB.TXT KOI8-R.TXT KOI8-U.TXT KSX1001.TXT euc-jis-2004-std.txt gb-18030-2000.xml sjis-0213-2004-std.txt windows-949-2000.xml [postgres@sss1 Unicode]$ git status On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) deleted:euc-jis-2004-std.txt deleted:gb-18030-2000.xml deleted:sjis-0213-2004-std.txt no changes added to commit (use "git add" and/or "git commit -a") This seems wrong. If you "make maintainer-clean", that removes even more: $ git status On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) deleted:big5_to_utf8.map deleted:euc-jis-2004-std.txt deleted:euc_cn_to_utf8.map deleted:euc_jis_2004_to_utf8.map deleted:euc_jp_to_utf8.map deleted:euc_kr_to_utf8.map deleted:euc_tw_to_utf8.map deleted:gb-18030-2000.xml deleted:gb18030_to_utf8.map deleted:gbk_to_utf8.map deleted:iso8859_10_to_utf8.map deleted:iso8859_13_to_utf8.map deleted:iso8859_14_to_utf8.map deleted:iso8859_15_to_utf8.map deleted:iso8859_16_to_utf8.map deleted:iso8859_2_to_utf8.map deleted:iso8859_3_to_utf8.map deleted:iso8859_4_to_utf8.map deleted:iso8859_5_to_utf8.map deleted:iso8859_6_to_utf8.map deleted:iso8859_7_to_utf8.map deleted:iso8859_8_to_utf8.map deleted:iso8859_9_to_utf8.map deleted:johab_to_utf8.map deleted:koi8r_to_utf8.map deleted:koi8u_to_utf8.map deleted:shift_jis_2004_to_utf8.map deleted:sjis-0213-2004-std.txt deleted:sjis_to_utf8.map deleted:uhc_to_utf8.map deleted:utf8_to_big5.map deleted:utf8_to_euc_cn.map deleted:utf8_to_euc_jis_2004.map deleted:utf8_to_euc_jp.map deleted:utf8_to_euc_kr.map deleted:utf8_to_euc_tw.map deleted:utf8_to_gb18030.map deleted:utf8_to_gbk.map deleted:utf8_to_iso8859_10.map deleted:utf8_to_iso8859_13.map deleted:utf8_to_iso8859_14.map deleted:utf8_to_iso8859_15.map deleted:utf8_to_iso8859_16.map deleted:utf8_to_iso8859_2.map deleted:utf8_to_iso8859_3.map deleted:utf8_to_iso8859_4.map deleted:utf8_to_iso8859_5.map deleted:utf8_to_iso8859_6.map deleted:utf8_to_iso8859_7.map deleted:utf8_to_iso8859_8.map deleted:utf8_to_iso8859_9.map deleted:utf8_to_johab.map deleted:utf8_to_koi8r.map deleted:utf8_to_koi8u.map deleted:utf8_to_shift_jis_2004.map deleted:utf8_to_sjis.map deleted:utf8_to_uhc.map deleted:utf8_to_win1250.map deleted:utf8_to_win1251.map deleted:utf8_to_win1252.map deleted:utf8_to_win1253.map deleted:utf8_to_win1254.map deleted:utf8_to_win1255.map deleted:utf8_to_win1256.map deleted:utf8_to_win1257.map deleted:utf8_to_win1258.map deleted:utf8_to_win866.map deleted:utf8_to_win874.map deleted:win1250_to_utf8.map deleted:win1251_to_utf8.map deleted:win1252_to_utf8.map deleted:win1253_to_utf8.map deleted:win1254_to_utf8.map deleted:win1255_to_utf8.map deleted:win1256_to_utf8.map deleted:win1257_to_utf8.map deleted:win1258_to_utf8.map deleted:win866_to_utf8.map
Re: SET ROLE documentation improvement
On Fri, Sep 15, 2023 at 02:36:16PM -0700, Yurii Rashkovskii wrote: > On Fri, Sep 15, 2023 at 1:47 PM Nathan Bossart > wrote: >> I think another issue is that the aforementioned note doesn't mention the >> new SET option added in 3d14e17. > > How do you think we should word it in that note to make it useful? Maybe something like this: The current session user must have the SET option for the specified role_name, either directly or indirectly via a chain of memberships with the SET option. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: How to update unicode mapping table?
On 25.09.23 08:02, Japin Li wrote: When I try to update the unicode mapping table through *.xml in src/backend/utils/mb/Unicode/, it doesn't update the *.map files. I find the make cannot go to this directory, what can I do to update the mapping tables? This is done by "make update-unicode".
Re: Regression in COPY FROM caused by 9f8377f7a2
Andrew Dunstan writes: > On 2023-09-25 Mo 11:06, Andrew Dunstan wrote: >> On 2023-09-25 Mo 04:59, Laurenz Albe wrote: >>> CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); > Thinking about this a little more, wouldn't it be better if we checked > at the time we set the default that the value is actually valid for the > given column? This is only one manifestation of a problem you could run > into given this table definition. I dunno, it seems at least possible that someone would do this deliberately as a means of preventing the column from being defaulted. In any case, the current behavior has stood for a very long time and no one has complained that an error should be thrown sooner. regards, tom lane
Re: Fix a wrong comment in setrefs.c
Richard Guo writes: > I noticed a wrong comment in search_indexed_tlist_for_sortgroupref(). > /* The equal() check should be redundant, but let's be paranoid */ > It turns out that the equal() check is necessary, because the given > sort/group expression might be type of FuncExpr which converts integer > to numeric. Hmm. This kind of makes me itch, because in principle a ressortgroupref identifier should uniquely identify a sorting/grouping column. If it fails to do so here, maybe there are outright bugs lurking elsewhere? I poked into it a little and determined that the problematic ressortgroupref values are being assigned during prepunion.c, which says * By convention, all non-resjunk columns in a setop tree have * ressortgroupref equal to their resno. In some cases the ref isn't * needed, but this is a cleaner way than modifying the tlist later. So at the end of that, we can have Vars in the upper relations' targetlists that are associated by ressortgroupref with values in the setop input relations' tlists, but don't match. (You are right that added-on implicit coercions are one reason for the expressions to be different, but it's not the only one.) I'm inclined to write the comment more like "Usually the equal() check is redundant, but in setop plans it may not be, since prepunion.c assigns ressortgroupref equal to the column resno without regard to whether that matches the topmost level's sortgrouprefs and without regard to whether any implicit coercions are added in the setop tree. We might have to clean that up someday; but for now, just ignore any false matches." regards, tom lane
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Hi, On 2023-09-25 12:48:30 -0700, Andres Freund wrote: > On 2023-09-25 15:42:26 -0400, Tom Lane wrote: > > I just did a git bisect run to discover when the failure documented > > in bug #18130 [1] started. And the answer is commit 82a4edabd. > > Now, it's pretty obvious that that commit didn't in itself cause > > problems like this: > > > > postgres=# \copy test from 'bug18130.csv' csv > > ERROR: could not read block 5 in file "base/5/17005": read only 0 of 8192 > > bytes > > CONTEXT: COPY test, line 472: > > "0,185647715,222655,489637,2,2023-07-31,9100.000,302110385,2023-07-30 > > 14:16:36.750981+00,14026347..." > > Ugh. > > > > IMO there must be some very nasty bug lurking in the new > > multiple-block extension logic, that happens to be exposed by this > > test case with 82a4edabd's adjustments to the when-to-extend choices > > but wasn't before that. > > > To save other people the trouble of extracting the in-line data > > in the bug submission, I've attached the test files I was using. > > Thanks, looking at this now. (had to switch locations in between) Uh, huh. The problem is that COPY uses a single BulkInsertState for multiple partitions. Which to me seems to run counter to the following comment: * The caller can also provide a BulkInsertState object to optimize many * insertions into the same relation. This keeps a pin on the current * insertion target page (to save pin/unpin cycles) and also passes a * BULKWRITE buffer selection strategy object to the buffer manager. * Passing NULL for bistate selects the default behavior. The reason this doesn't cause straight up corruption due to reusing a pin from another relation is that b1ecb9b3fcfb added ReleaseBulkInsertStatePin() and a call to it. But I didn't make ReleaseBulkInsertStatePin() reset the bulk insertion state, which is what leads to the errors from the bug report. Resetting the relevant BulkInsertState fields fixes the problem. But I'm not sure that's the right fix. ISTM that independent of whether we fix this via ReleaseBulkInsertStatePin() resetting the fields or via not reusing BulkInsertState, we should add assertions defending against future issues like this (e.g. by adding a relation field to BulkInsertState in cassert builds, and asserting that the relation is the same as in prior calls unless ReleaseBulkInsertStatePin() has been called). Greetings, Andres Freund
Re: Eager page freeze criteria clarification
On Mon, Sep 25, 2023 at 11:45 AM Robert Haas wrote: > > The reason I was thinking of using the "lsn at the end of the last vacuum", > > is > > that it seems to be more adapative to the frequency of vacuuming. > > Yes, but I think it's *too* adaptive. The frequency of vacuuming can > plausibly be multiple times per minute or not even annually. That's > too big a range of variation. +1. The risk of VACUUM chasing its own tail seems very real. We want VACUUM to be adaptive to the workload, not adaptive to itself. > Yeah, I don't know if that's exactly the right idea, but I think it's > in the direction that I was thinking about. I'd even be happy with > 100% of the time-between-recent checkpoints, maybe even 200% of > time-between-recent checkpoints. But I think there probably should be > some threshold beyond which we say "look, this doesn't look like it > gets touched that much, let's just freeze it so we don't have to come > back to it again later." The sole justification for any strategy that freezes lazily is that it can avoid useless freezing when freezing turns out to be unnecessary -- that's it. So I find it more natural to think of freezing as the default action, and *not freezing* as the thing that requires justification. Thinking about it "backwards" like that just seems simpler to me. There is only one possible reason to not freeze, but several reasons to freeze. > I think part of the calculus here should probably be that when the > freeze threshold is long, the potential gains from making it even > longer are not that much. If I change the freeze threshold on a table > from 1 minute to 1 hour, I can potentially save uselessly freezing > that page 59 times per hour, every hour, forever, if the page always > gets modified right after I touch it. If I change the freeze threshold > on a table from 1 hour to 1 day, I can only save 23 unnecessary > freezes per day. I totally agree with you on this point. It seems related to my point about "freezing being the conceptual default action" in VACUUM. Generally speaking, over-freezing is a problem when we reach the same wrong conclusion (freeze the page) about the same relatively few pages over and over -- senselessly repeating those mistakes really adds up when you're vacuuming the same table very frequently. On the other hand, under-freezing is typically a problem when we reach the same wrong conclusion (don't freeze the page) about lots of pages only once in a very long while. I strongly suspect that there is very little gray area between the two, across the full spectrum of application characteristics. Most individual pages have very little chance of being modified in the short to medium term. In a perfect world, with a perfect algorithm, we'd almost certainly be freezing most pages at the earliest opportunity. It is nevertheless also true that a freezing policy that is only somewhat more aggressive than this ideal oracle algorithm will freeze way too aggressive (by at least some measures). There isn't much of a paradox to resolve here: it's all down to the cadence of vacuuming, and of rows subject to constant churn. As you point out, the "same policy" can produce dramatically different outcomes when you actually consider what the consequences of the policy are over time, when applied by VACUUM under a variety of different workload conditions. So any freezing policy must be designed with due consideration for those sorts of things. If VACUUM doesn't freeze the page now, then when will it freeze it? For most individual pages, that time will come (again, pages that benefit from lazy vacuuming are the exception rather than the rule). Right now, VACUUM almost behaves as if it thought "that's not my problem, it's a problem for future me!". Trying to differentiate between pages that we must not over freeze and pages that we must not under freeze seems important. Generational garbage collection (as used by managed VM runtimes) does something that seems a little like this. It's based on the empirical observation that "most objects die young". What the precise definition of "young" really is varies significantly, but that turns out to be less of a problem than you might think -- it can be derived through feedback cycles. If you look at memory lifetimes on a logarithmic scale, very different sorts of applications tend to look like they have remarkably similar memory allocation characteristics. > Percentage-wise, the overhead of being wrong is the > same in both cases: I can have as many extra freeze operations as I > have page modifications, if I pick the worst possible times to freeze > in every case. But in absolute terms, the savings in the second > scenario are a lot less. Very true. I'm surprised that there hasn't been any discussion of the absolute amount of system-wide freeze debt on this thread. If 90% of the pages in the entire database are frozen, it'll generally be okay if we make the wrong call by freezing lazily when we shouldn't. This i
Re: Add 'worker_type' to pg_stat_subscription
On Thu, Sep 21, 2023 at 04:01:20PM +0530, Amit Kapila wrote: > The changes looks good to me, though I haven't tested it. But feel > free to commit if you are fine with this patch. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: SQL:2011 application time
On 25.09.23 21:20, Paul Jungwirth wrote: On 9/24/23 21:52, jian he wrote: On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth wrote: On 9/17/23 20:11, jian he wrote: small issues so far I found, v14. Thank you again for the review! v15 is attached. hi. some tiny issues. Rebased v16 patches attached. Looking through the tests in v16-0001: +-- PK with no columns just WITHOUT OVERLAPS: +CREATE TABLE temporal_rng ( + valid_at tsrange, + CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS) +); +ERROR: syntax error at or near "WITHOUT" +LINE 3: CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV... + ^ I think this error is confusing. The SQL standard requires at least one non-period column in a PK. I don't know why that is or why we should implement it. But if we want to implement it, maybe we should enforce that in parse analysis rather than directly in the parser, to be able to produce a more friendly error message. +-- PK with a range column/PERIOD that isn't there: +CREATE TABLE temporal_rng ( + id INTEGER, + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) +); +ERROR: range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist I think here we should just produce a "column doesn't exist" error message, the same as if the "id" column was invalid. We don't need to get into the details of what kind of column it should be. That is done in the next test +ERROR: column "valid_at" in WITHOUT OVERLAPS is not a range type Also, in any case it would be nice to have a location pointer here (for both cases). +-- PK with one column plus a range: +CREATE TABLE temporal_rng ( + -- Since we can't depend on having btree_gist here, + -- use an int4range instead of an int. + -- (The rangetypes regression test uses the same trick.) + id int4range, + valid_at tsrange, + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) +); I'm confused why you are using int4range here (and in further tests) for the scalar (non-range) part of the primary key. Wouldn't a plaint int4 serve here? +SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE conname = 'temporal_rng_pk'; +pg_get_indexdef +--- + CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, valid_at) Shouldn't this somehow show the operator classes for the columns? We are using different operator classes for the id and valid_at columns, aren't we? +-- PK with USING INDEX (not possible): +CREATE TABLE temporal3 ( + id int4range, + valid_at tsrange +); +CREATE INDEX idx_temporal3_uq ON temporal3 USING gist (id, valid_at); +ALTER TABLE temporal3 + ADD CONSTRAINT temporal3_pk + PRIMARY KEY USING INDEX idx_temporal3_uq; +ERROR: "idx_temporal3_uq" is not a unique index +LINE 2: ADD CONSTRAINT temporal3_pk + ^ +DETAIL: Cannot create a primary key or unique constraint using such an index. Could you also add a test where the index is unique and the whole thing does work? Apart from the tests, how about renaming the column pg_constraint.contemporal to something like to conwithoutoverlaps?
Re: Regression in COPY FROM caused by 9f8377f7a2
On 2023-09-25 Mo 11:06, Andrew Dunstan wrote: On 2023-09-25 Mo 04:59, Laurenz Albe wrote: On Mon, 2023-09-25 at 09:54 +0200, Laurenz Albe wrote: In v16 and later, the following fails: CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); COPY boom FROM STDIN; ERROR: value too long for type character varying(5) In PostgreSQL v15 and earlier, the COPY statement succeeds. The error is thrown in BeginCopyFrom in line 1578 (HEAD) defexpr = expression_planner(defexpr); Bisecting shows that the regression was introduced by commit 9f8377f7a2, which introduced DEFAULT values for COPY FROM. Thinking about this a little more, wouldn't it be better if we checked at the time we set the default that the value is actually valid for the given column? This is only one manifestation of a problem you could run into given this table definition. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Eager page freeze criteria clarification
On Sat, Sep 23, 2023 at 3:53 PM Melanie Plageman wrote: > Freeze tuples on a page opportunistically if the page would be totally > frozen and: > > 4. Buffer is already dirty and no FPI is required OR page LSN is older > than 33% of the LSNs since the last vacuum of the table. > > 5. Buffer is already dirty and no FPI is required AND page LSN is older > than 33% of the LSNs since the last vacuum of the table. > > On master, the heuristic is to freeze a page opportunistically if it > would be totally frozen and if pruning emitted an FPI. > --- > > My takeaways from all of the workload results are as follows: > > Algorithm 4 is too aggressive and regresses performance compared to > master. > > Algorithm 5 freezes surprisingly few pages, especially in workloads > where only the most recent data is being accessed or modified > (append-only, update recent data). > > A work queue-like workload with other concurrent workloads is a > particular challenge for the freeze heuristic, and we should think more > about how to handle this. I feel like we have a sort of Goldilocks problem here: the porridge is either too hot or too cold, never just right. Say we just look at workload B, looking at the difference between pgbench_accounts (which is randomly and frequently updated and thus shouldn't be opportunistically frozen) and pgbench_history (which is append-only and should thus be frozen aggressively). Algorithm 4 gets pgbench_history right and pgbench_accounts wrong, and master does the opposite. In a perfect world, we'd have an algorithm which could distinguish sharply between those cases, ramping up to maximum aggressiveness on pgbench_history while doing nothing at all to pgbench_accounts. Algorithm 5 partially accomplishes this, but the results aren't super-inspiring either. It doesn't add many page freezes in the case where freezing is bad, but it also only manages to freeze a quarter of pgbench_history, where algorithm 4 manages to freeze basically all of it. That's a pretty stark difference. Given that algorithm 5 seems to make some mistakes on some of the other workloads, I don't think it's entirely clear that it's an improvement over master, at least in practical terms. It might be worth thinking harder about what it takes specifically to get the pgbench_history case, aka the append-only table case, correct. One thing that probably doesn't work very well is to freeze pages that are more than X minutes old. Algorithm 5 uses an LSN threshold instead of a wall-clock based threshold, but the effect is the same. I think the problem here is that the vacuum operation essentially happens in an instant. At the instant that it happens, some fraction of the data added since the last vacuum is older than whatever threshold you pick, and the rest is newer. If data is added at a constant rate and you want to freeze at least 90% of the data, your recency threshold has to be no more than 10% of the time since the last vacuum. But with autovacuum_naptimes=60s, that's like 6 seconds, and that's way too aggressive for a table like pgbench_accounts. It seems to me that it's not possible to get both cases right by twiddling the threshold, because pgbench_history wants the threshold to be 0, and pgbench_accounts wants it to be ... perhaps not infinity, because maybe the distribution is Gaussian or Zipfian or something rather than uniform, but probably a couple of minutes. So I feel like if we want to get both pgbench_history and pgbench_accounts right, we need to consider some additional piece of information that makes those cases distinguishable. Either of those tables can contain a page that hasn't been accessed in 20 seconds, but the correct behavior for such a page differs between one case and the other. One random idea that I had was to refuse to opportunistically freeze a page more than once while it remains resident in shared_buffers. The first time we do it, we set a bit in the buffer header or something that suppresses further opportunistic freezing. When the buffer is evicted the bit is cleared. So we can still be wrong on a heavily updated table like pgbench_acccounts, but if the table fits in shared_buffers, we'll soon realize that we're getting it wrong a lot and will stop making the same mistake over and over. But this kind of idea only works if the working set is small enough to fit in shared_buffers, so I don't think it's actually a great plan, unless we only care about suppressing excess freezing on workloads that fit in shared_buffers. A variant on the same theme could be to keep some table-level counters and use them to assess how often we're getting it wrong. If we're often thawing recently-frozen pages, don't freeze so aggressively. But this will not work if different parts of the same table behave differently. If we don't want to do something like this that somehow responds to the characteristics of a particular page or table, then it seems we either have to freeze quite aggressively to pick up
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On 9/25/23 14:03, Jeff Davis wrote: On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote: Should there be a way to have a separate "execution" search_path? I hadn't considered that and I like that idea for a few reasons: * a lot of the problem cases are for functions that don't need to access tables at all, e.g., in an index expression. * it avoids annoyances with pg_temp, because that's not searched for functions/operators anyway * perhaps we could force the object search_path to be empty for IMMUTABLE functions? I haven't thought it through in detail, but it seems like a promising approach. Related to this, it would be useful if you could grant create on schema for only non-executable objects. You may want to allow a user to create their own tables but not allow them to create their own functions, for example. Right now "GRANT CREATE ON SCHEMA foo" gives the grantee the ability to create "all the things". -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Hi, On 2023-09-25 15:42:26 -0400, Tom Lane wrote: > I just did a git bisect run to discover when the failure documented > in bug #18130 [1] started. And the answer is commit 82a4edabd. > Now, it's pretty obvious that that commit didn't in itself cause > problems like this: > > postgres=# \copy test from 'bug18130.csv' csv > ERROR: could not read block 5 in file "base/5/17005": read only 0 of 8192 > bytes > CONTEXT: COPY test, line 472: > "0,185647715,222655,489637,2,2023-07-31,9100.000,302110385,2023-07-30 > 14:16:36.750981+00,14026347..." Ugh. > IMO there must be some very nasty bug lurking in the new > multiple-block extension logic, that happens to be exposed by this > test case with 82a4edabd's adjustments to the when-to-extend choices > but wasn't before that. > To save other people the trouble of extracting the in-line data > in the bug submission, I've attached the test files I was using. Thanks, looking at this now. > The DDL is simplified slightly from what was submitted. I'm not > entirely sure why a no-op trigger is needed to provoke the bug... A trigger might prevent using the multi-insert API, which would lead to different execution paths... Greetings, Andres Freund
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Andres Freund writes: > On 2023-09-06 18:01:53 -0400, Tom Lane wrote: >> It turns out that this patch is what's making buildfarm member >> chipmunk fail in contrib/pg_visibility [1]. That's easily reproduced >> by running the test with shared_buffers = 10MB. I didn't dig further >> than the "git bisect" result: > At first I was a bit confounded by not being able to reproduce this. My test > environment had max_connections=110 for some other reason - and the problem > doesn't reproduce with that setting... I just did a git bisect run to discover when the failure documented in bug #18130 [1] started. And the answer is commit 82a4edabd. Now, it's pretty obvious that that commit didn't in itself cause problems like this: postgres=# \copy test from 'bug18130.csv' csv ERROR: could not read block 5 in file "base/5/17005": read only 0 of 8192 bytes CONTEXT: COPY test, line 472: "0,185647715,222655,489637,2,2023-07-31,9100.000,302110385,2023-07-30 14:16:36.750981+00,14026347..." IMO there must be some very nasty bug lurking in the new multiple-block extension logic, that happens to be exposed by this test case with 82a4edabd's adjustments to the when-to-extend choices but wasn't before that. To save other people the trouble of extracting the in-line data in the bug submission, I've attached the test files I was using. The DDL is simplified slightly from what was submitted. I'm not entirely sure why a no-op trigger is needed to provoke the bug... regards, tom lane [1] https://www.postgresql.org/message-id/18130-7a86a7356a75209d%40postgresql.org -- run this, then \copy test from 'bug18130.csv' csv create table if not exists test ( a smallint, b bigint, c bigint, d bigint, e smallint, plan_date date, g numeric(18,7), h bigint, i timestamptz, j bigint, k integer, l smallint, primary key (a, b, c, d, e, plan_date) ) partition by list(a); /* this is just to generate partition structure automatically */ create or replace function test_add_partitions(a integer, year integer) returns void volatile strict as $$ declare parent text; root text; begin root := 'test_' || a::text; execute 'create table if not exists ' || root || ' partition of test for values in (' || a::text || ') partition by hash (b);'; for b in 0..7 loop parent := root || '_' || b::text; execute 'create table if not exists ' || parent || ' partition of ' || root || ' for values with (modulus 8, remainder ' || b::text || ')' || ' partition by range (plan_date);'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm1 partition of ' || parent || ' for values from (''' || year::text || '-01-01'') to (''' || year::text || '-02-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm2 partition of ' || parent || ' for values from (''' || year::text || '-02-01'') to (''' || year::text || '-03-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm3 partition of ' || parent || ' for values from (''' || year::text || '-03-01'') to (''' || year::text || '-04-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm4 partition of ' || parent || ' for values from (''' || year::text || '-04-01'') to (''' || year::text || '-05-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm5 partition of ' || parent || ' for values from (''' || year::text || '-05-01'') to (''' || year::text || '-06-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm6 partition of ' || parent || ' for values from (''' || year::text || '-06-01'') to (''' || year::text || '-07-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm7 partition of ' || parent || ' for values from (''' || year::text || '-07-01'') to (''' || year::text || '-08-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm8 partition of ' || parent || ' for values from (''' || year::text || '-08-01'') to (''' || year::text || '-09-01'');'; execute 'create table if not exists ' || parent || '_y' || year::text || 'm9 partition of ' || parent || ' for values from (''' || year::text || '-09-01'') to ('''
Re: DROP DATABASE is interruptible
Hi, On 2023-09-25 01:48:31 +0100, Peter Eisentraut wrote: > I noticed that this patch set introduced this pg_dump test: > > On 12.07.23 03:59, Andres Freund wrote: > > + 'CREATE DATABASE invalid...' => { > > + create_order => 1, > > + create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET > > datconnlimit = -2 WHERE datname = 'invalid'), > > + regexp => qr/^CREATE DATABASE invalid/m, > > + not_like => { > > + pg_dumpall_dbprivs => 1, > > + }, > > + }, > > But the key "not_like" isn't used for anything by that test suite. Maybe > "unlike" was meant? It's not clear to me either. Invalid databases shouldn't *ever* be dumped, so explicitly listing pg_dumpall_dbprivs is odd. TBH, I find this testsuite the most opaque in postgres... > But even then it would be useless because the "like" key is empty, so there > is nothing that "unlike" can subtract from. Was there something expected > from the mention of "pg_dumpall_dbprivs"? Not that I can figure out... > Perhaps it would be better to write out > > like => {}, > > explicitly, with a comment, like some other tests are doing. Yea, that looks like the right direction. I'll go and backpatch the adjustment. Greetings, Andres Freund
Re: Eager page freeze criteria clarification
On Fri, Sep 8, 2023 at 12:07 AM Andres Freund wrote: > > Downthread, I proposed using the RedoRecPtr of the latest checkpoint > > rather than the LSN of the previou vacuum. I still like that idea. > > Assuming that "downthread" references > https://postgr.es/m/CA%2BTgmoYb670VcDFbekjn2YQOKF9a7e-kBFoj2WJF1HtH7YPaWQ%40mail.gmail.com > could you sketch out the logic you're imagining a bit more? I'm not exactly sure what the question is here. I mean, it doesn't quite make sense to just ask whether the page LSN is newer than the last checkpoint's REDO record, because I think that's basically just asking whether or not we would need an FPI, and the freeze criteria that Melanie has been considering incorporate that check in some other way already. But maybe some variant on that idea is useful - the distance to the second-most-recent checkpoint, or a multiple or percentage of the distance to the most-recent checkpoint, or whatever. > The reason I was thinking of using the "lsn at the end of the last vacuum", is > that it seems to be more adapative to the frequency of vacuuming. Yes, but I think it's *too* adaptive. The frequency of vacuuming can plausibly be multiple times per minute or not even annually. That's too big a range of variation. The threshold for freezing can vary by how actively the table or the page is updated, but I don't think it should vary by six orders of magnitude. Under what theory does it make sense to say that say "this row in table hasn't been modified in 20 seconds, so let's freeze it, but this row in table B hasn't been modified in 8 months, so let's not freeze it because it might be modified again soon"? If you use the LSN at the end of the last vacuum, you're going to end up making decisions exactly like that, which seems wrong to me. > Perhaps we can mix both approaches. We can use the LSN and time of the last > vacuum to establish an LSN->time mapping that's reasonably accurate for a > relation. For infrequently vacuumed tables we can use the time between > checkpoints to establish a *more aggressive* cutoff for freezing then what a > percent-of-time-since-last-vacuum appach would provide. If e.g. a table gets > vacuumed every 100 hours and checkpoint timeout is 1 hour, no realistic > percent-of-time-since-last-vacuum setting will allow freezing, as all dirty > pages will be too new. To allow freezing a decent proportion of those, we > could allow freezing pages that lived longer than ~20% > time-between-recent-checkpoints. Yeah, I don't know if that's exactly the right idea, but I think it's in the direction that I was thinking about. I'd even be happy with 100% of the time-between-recent checkpoints, maybe even 200% of time-between-recent checkpoints. But I think there probably should be some threshold beyond which we say "look, this doesn't look like it gets touched that much, let's just freeze it so we don't have to come back to it again later." I think part of the calculus here should probably be that when the freeze threshold is long, the potential gains from making it even longer are not that much. If I change the freeze threshold on a table from 1 minute to 1 hour, I can potentially save uselessly freezing that page 59 times per hour, every hour, forever, if the page always gets modified right after I touch it. If I change the freeze threshold on a table from 1 hour to 1 day, I can only save 23 unnecessary freezes per day. Percentage-wise, the overhead of being wrong is the same in both cases: I can have as many extra freeze operations as I have page modifications, if I pick the worst possible times to freeze in every case. But in absolute terms, the savings in the second scenario are a lot less. I think if a user is accessing a table frequently, the overhead of jamming a useless freeze in between every table access is going to be a lot more noticeable then when the table is only accessed every once in a while. And I also think it's a lot less likely that we'll reliably get it wrong. Workloads that touch a page and then touch it again ~N seconds later can exist for all values of N, but I bet they're way more common for small values of N than large ones. Is there also a need for a similar guard in the other direction? Let's say that autovacuum_naptime=15s and on some particular table it triggers every time. I've actually seen this on small queue tables. Do you think that, in such tables, we should freeze pages that haven't been modified in 15s? > Hm, possibly stupid idea: What about using shared_buffers residency as a > factor? If vacuum had to read in a page to vacuum it, a) we would need read IO > to freeze it later, as we'll soon evict the page via the ringbuffer b) > non-residency indicates the page isn't constantly being modified? This doesn't seem completely stupid, but I fear it would behave dramatically differently on a workload a little smaller than s_b vs. one a little larger than s_b, and that doesn't seem good. -- Robert Haas EDB: http://www.enterprisedb
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote: > Should there be a way to have a separate "execution" search_path? I hadn't considered that and I like that idea for a few reasons: * a lot of the problem cases are for functions that don't need to access tables at all, e.g., in an index expression. * it avoids annoyances with pg_temp, because that's not searched for functions/operators anyway * perhaps we could force the object search_path to be empty for IMMUTABLE functions? I haven't thought it through in detail, but it seems like a promising approach. Regards, Jeff Davis
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On Mon, 2023-09-25 at 11:30 -0400, Robert Haas wrote: > On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis wrote: > > You expect > > Bob to do something like: > > > > CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ... > > > > for all functions, not just SECURITY DEFINER functions, is that > > right? > > Yes, I do. Do users like Bob do that today? If not, what causes you to expect them to do so in the future? > I think it's self-evident that a SQL function's behavior is > not guaranteed to be invariant under all possible values of > search_path. It's certainly not self-evident in a literal sense. I think you mean that it's "obvious" or something, and perhaps that narrow question is, but it's also not terribly helpful. If the important behaviors here were so obvious, how did we end up in this mess in the first place? > > We've already established that even postgres hackers are having > > difficulty keeping up with these nuances. Even though the SET > > clause > > has been there for a long time, our documentation on the subject is > > insufficient and misleading. And on top of that, it's extra typing > > and > > noise for every schema file. Until we make some changes I don't > > think > > we can expect users to do as you suggest. > > Respectfully, I find this position unreasonable, to the point of > finding it difficult to take seriously. Which part exactly is unreasonable? * Hackers are having trouble keeping up with the nuances. * Our documentation on the subject *is* insufficient and misleading. * "pg_temp" is noise. It seems like you think that users are already doing "SET search_path = pg_catalog, pg_temp" in all the necessary places, and therefore no change is required? > Most of the problems that we're dealing with here have analogues in > the world of shell scripts. I think analogies to unix are what caused a lot of the problems we have today, because postgres is a very different world. In unix-like environments, a file is just a file; in postgres, we have all kinds of code attached in interesting ways. Regards, Jeff Davis
Re: XLog size reductions: Reduced XLog record header size for PG17
On Wed, 20 Sept 2023 at 07:06, Michael Paquier wrote: > > On Tue, Sep 19, 2023 at 12:07:07PM +0200, Matthias van de Meent wrote: > > V5 is a rebased version of v4, and includes the latest patch from > > "smaller XLRec block header" [0] as 0001. > > 0001 and 0007 are the meat of the changes. Correct. > -#define XLR_CHECK_CONSISTENCY 0x02 > +#define XLR_CHECK_CONSISTENCY (0x20) > > I can't help but notice that there are a few stylistic choices like > this one that are part of the patch. Using parenthesis in the case of > hexa values is inconsistent with the usual practices I've seen in the > tree. Yes, I'll take another look at that. > #define COPY_HEADER_FIELD(_dst, _size)\ > do {\ > -if (remaining < _size)\ > +if (remaining < (_size))\ > goto shortdata_err;\ > > There are a couple of stylistic changes like this one, that I guess > could just use their own patch to make these macros easier to use. They actually fix complaints of my IDE, but are otherwise indeed stylistic. > -#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info) > +#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info & > XLR_INFO_MASK) > +#define XLogRecGetRmgrInfo(decoder) (((decoder)->record->header.xl_info) & > XLR_RMGR_INFO_MASK) > > This stuff in 0002 is independent of 0001, am I right? Doing this > split with an extra macro is okay by me, reducing the presence of > XLR_INFO_MASK and bitwise operations based on it. Yes, that change is to stop making use of (~XLR_INFO_MASK) where XLR_RMGR_INFO_MASK is the correct bitmask (whilst also being quite useful in the later patch). > 0003 is also mechanical, but if you begin to enforce the use of > XLR_RMGR_INFO_MASK as the bits allowed to be passed down to the RMGR > identity callback, we should have at least a validity check to make > sure that nothing, even custom RMGRs, pass down unexpected bits? I think that's already handled in XLogInsert(), but I'll make sure to add more checks if they're not in place yet. > I am not convinced that XLOG_INCLUDE_XID is a good interface, TBH, and > I fear that people are going to forget to set it. Wouldn't it be > better to use an option where the XID is excluded instead, making the > inclusing the an XID the default? Most rmgrs don't actually use the XID. Only XACT, MULTIXACT, HEAP, HEAP2, and LOGICALMSG use the xid, so I thought it would be easier to just find the places where those RMGR's records were being logged than to update all other places. I don't mind changing how we decide to log the XID, but I don't think EXCLUDE_XID is a good alternative: most records just don't need the transaction ID. There are many more index AMs with logging than table AMs, so I don't think it is that weird to default to 'not included'. > > The resource manager has ID = 0, thus requiring some special > > handling in other code. Apart from being generally useful, it is > > used in future patches to detect the end of wal in lieu of a zero-ed > > fixed-size xl_tot_len field. > > Err, no, that may not be true. See for example this thread where the > topic of improving the checks of xl_tot_len and rely on this value on > when a record header has been validated, even across page borders: > https://www.postgresql.org/message-id/17928-aa92416a70ff4...@postgresql.org Yes, there are indeed exceptions when reusing WAL segments, but it's still a good canary, like xl_tot_len before this patch. > Except that, in which cases could an invalid RMGR be useful? A sentinel value that is obviously invalid is available for several types, e.g. BlockNumber, TransactionId, XLogRecPtr, Buffer, and this is quite useful if you want to check if something is definitely invalid. I think that's fine in principle, we're already "wasting" some IDs in the gap between RM_MAX_BUILTIN_ID and RM_MIN_CUSTOM_ID. In the current xlog infrastructure, we use xl_tot_len as that sentinel to detect whether a new record may exist, but in this patch that can't be used because the field may not exist and depends on other bytes. So I used xl_rmgr_id as the field to base the 'may a next record exist' checks on, which required the 0 rmgr ID to be invalid. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On Mon, Sep 25, 2023 at 12:00 PM Joe Conway wrote: > Should there be a way to have a separate "execution" search_path? I have heard that idea proposed before, and I don't think it's a terrible idea, but I also don't think it fixes anything terribly fundamental. I think it's pretty normal for people to define functions and procedures and then call them from other functions and procedures, and if you do that, then you need that schema in your execution search path. Of course, if somebody doesn't do that, or schema-qualifies all such references, then this becomes useful for defense in depth. But I find it hard to see it as anything more than defense in depth because I think a lot of people will need to have use cases where they need to put non-system schemas into the execution search path, and such people wouldn't really benefit from the existence of this feature. Slightly off-topic, but I wonder whether, if we do this, we ought to do it by adding some kind of a marker to the existing search_path, rather than by creating a new GUC. For example, maybe putting & before a schema name means that it can be searched, but only for non-executable things. Then you could set search_path = &jconway, pg_catalog or something of that kind. It could potentially be more powerful to have it be a completely separate setting, but if we do that, everyone who currently needs to secure search_path properly starts needing to also secure execution_search_path properly. This is one of those cases where two is not better than one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy
On Sat, Sep 23, 2023 at 4:22 PM Peter Geoghegan wrote: > Attached draft patch shows how this could work. > > _bt_restore_array_keys() has comments that seem to suppose that > calling _bt_preprocess_keys is fairly expensive, and something that's > well worth avoiding. But...is it, really? I wonder if we'd be better > off just biting the bullet and always calling _bt_preprocess_keys > here. My current plan is to commit something close to my original patch on Wednesday or Thursday. My proposed fix is minimally invasive (it still allows _bt_restore_array_keys to avoid most calls to _bt_preprocess_keys), so I don't see any reason to delay acting here. -- Peter Geoghegan
Re: XLog size reductions: smaller XLRec block header for PG17
On Tue, 19 Sept 2023 at 01:03, Andres Freund wrote: > > Hi, > > On 2023-05-18 19:22:26 +0300, Heikki Linnakangas wrote: > > On 18/05/2023 17:59, Matthias van de Meent wrote: > > > It changes the block IDs used to fit in 6 bits, using the upper 2 bits > > > of the block_id field to store how much data is contained in the > > > record (0, <=UINT8_MAX, or <=UINT16_MAX bytes). > > > > Perhaps we should introduce a few generic inline functions to do varint > > encoding. That could be useful in many places, while this scheme is very > > tailored for XLogRecordBlockHeader. This scheme is reused later for the XLogRecord xl_tot_len field over at [0], and FWIW is thus being reused. Sure, it's tailored to this WAL use case, but IMO we're getting good value from it. We don't use protobuf or JSON for WAL, we use our own serialization format. Having some specialized encoding/decoding in that format for certain fields is IMO quite acceptable. > Yes - I proposed that and wrote an implementation of reasonably efficient > varint encoding. Here's my prototype: > https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de As I mentioned on that thread, that prototype has a significant probability of doing nothing to improve WAL size, or even increasing the WAL size for installations which consume a lot of OIDs. > I think it's a bad tradeoff to write lots of custom varint encodings, just to > eek out a bit more space savings. This is only a single "custom" varint encoding though, if you can even call it that. It makes a field's size depend on flags set in another byte, which is not that much different from the existing use of XLR_BLOCK_ID_DATA_[LONG, SHORT]. > The increase in code complexity IMO makes it a bad tradeoff. Pardon me for asking, but what would you consider to be a good tradeoff then? I think the code relating to the WAL storage format is about as simple as you can get it within the feature set it provides and the size of the resulting records. While I think there is still much to gain w.r.t. WAL record size, I don't think we can get much of those improvements without adding at least some amount of complexity, something I think to be true for most components in PostgreSQL. So, except for redesigning significant parts of the public WAL APIs, are we just going to ignore any potential improvements because they "increase code complexity"? Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://commitfest.postgresql.org/43/4386/
Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE
On Sun, 2 Jul 2023 at 20:42, vignesh C wrote: > > Hi, > > Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE": > 1) GRANT, REVOKE and FOR USER keyword was not displayed in tab > completion of alter default privileges like the below statement: > ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC; > ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC; > ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM dba1; > > 2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA > public FOR " like in below statement: > ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT > ON TABLES TO PUBLIC; > > 3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES > REVOKE " like in below statement: > alter default privileges revoke grant option for select ON tables FROM dba1; > > 4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN > column-name SET" like in: > ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text; > > Attached patch has the changes for the same. Added a commitfest entry for this: https://commitfest.postgresql.org/45/4587/ Regards, Vignesh
Re: Improving btree performance through specializing by key shape, take 2
On Tue, 19 Sept 2023 at 22:49, Peter Geoghegan wrote: > > On Tue, Sep 19, 2023 at 6:28 AM Matthias van de Meent > wrote: > > > To be clear, page deletion does what I described here (it does an > > > in-place update of the downlink to the deleted page, so the same pivot > > > tuple now points to its right sibling, which is our page of concern), > > > in addition to fully removing the original pivot tuple whose downlink > > > originally pointed to our page of concern. This is why page deletion > > > makes the key space "move to the right", very much like a page split > > > would. > > > > I am still aware of this issue, and I think we've discussed it in > > detail earlier. I think it does not really impact this patchset. Sure, > > I can't use dynamic prefix compression to its full potential, but I > > still do get serious performance benefits: > > Then why have you linked whatever the first patch does with the high > key to dynamic prefix compression in the first place? Your commit > message makes it sound like it's a way to get around the race > condition that affects dynamic prefix compression, but as far as I can > tell it has nothing whatsoever to do with that race condition. We wouldn't have to store the downlink's right separator and compare it to the highkey if we didn't deviate from L&Y's algorithm for DELETE operations (which causes the race condition): just the right sibling's block number would be enough. (Yes, the right sibling's block number isn't available for the rightmost downlink of a page. In those cases, we'd have to reuse the parent page's high key with that of the downlink page, but I suppose that'll be relatively rare). > Questions: > > 1. Why shouldn't the high key thing be treated as an unrelated piece of work? Because it was only significant and relatively visible after getting rid of the other full key compare operations, and it touches essentially the same areas. Splitting them out in more patches would be a hassle. > I guess it's possible that it really should be structured that way, > but even then it's your responsibility to make it clear why that is. Sure. But I think I've made that clear upthread too. > As things stand, this presentation is very confusing. I'll take a look at improving the presentation. > 2. Separately, why should dynamic prefix compression be tied to the > specialization work? I also see no principled reason why it should be > tied to the other two things. My performance results show that insert performance degrades by 2-3% for single-column indexes if only dynamic the prefix truncation patch is applied [0]. The specialization patches fix that regression on my machine (5950x) due to having optimized code for the use case. I can't say for certain that other machines will see the same results, but I think results will at least be similar. > I didn't mind this sort of structure so much back when this work was > very clearly exploratory -- I've certainly structured work in this > area that way myself, in the past. But if you want this patch set to > ever go beyond being an exploratory patch set, something has to > change. I think it's fairly complete, and mostly waiting for review. > I don't have time to do a comprehensive (or even a fairly > cursory) analysis of which parts of the patch are helping, and which > are marginal or even add no value. It is a shame that you don't have the time to review this patch. > > > You'd have > > > to compare the lower bound separator key from the parent (which might > > > itself be the page-level low key for the parent) to the page low key. > > > That's not a serious suggestion; I'm just pointing out that you need > > > to be able to compare like with like for a canary condition like this > > > one, and AFAICT there is no lightweight practical way of doing that > > > that is 100% robust. > > > > True, if we had consistent LOWKEYs on pages, that'd make this job much > > easier: the prefix could indeed be carried over in full. But that's > > not currently the case for the nbtree code, and this is the next best > > thing, as it also has the benefit of working with all currently > > supported physical formats of btree indexes. > > I went over the low key thing again because I had to struggle to > understand what your high key optimization had to do with dynamic > prefix compression. I'm still struggling. I think that your commit > message very much led me astray. Quoting it here: > > """ > Although this limits [...] relatively expensive _bt_compare. > """ > > You're directly tying the high key optimization to the dynamic prefix > compression optimization. But why? The value of skipping the _bt_compare call on the highkey is relatively much higher in the prefix-skip case than it is on master, as on master it's only one of the log(n) _bt_compare calls on the page, while in the patch it's one of (on average) 3 full key _bt_compare calls. This makes it much easier to prove the performance gain, which made me integrate
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On 9/25/23 11:30, Robert Haas wrote: I don't believe that people want to run their functions under a sanitized search_path that only includes system schemas. That might work for some people, but I think most people will define functions that call other functions that they themselves defined, or access tables that they themselves created. They will therefore need the search_path to include the schemas in which they created those objects. Without diving into all the detailed nuances of this discussion, this particular paragraph made me wonder if at least part of the problem here is that the same search_path is used to find "things that I want to execute" (functions and operators) and "things I want to access" (tables, etc). I think many folks would be well served by only having system schemas in the search_path for the former (augmented by explicit schema qualifying of one's own functions), but agree that almost no one wants that for the latter (needing to schema qualify every table reference). Should there be a way to have a separate "execution" search_path? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Why need a lock?
jacktby jacktby writes: > I find this in source code > #define ShareLock 5 /* CREATE INDEX > (WITHOUT CONCURRENTLY) */ > I think if there is no CONCURRENTLY, so why we need a lock, I think that’s > unnecessary. What’s the point? We need a lock precisely to prevent concurrent execution (of table modifications that would confuse the index build). regards, tom lane
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis wrote: > On Thu, 2023-09-21 at 14:06 -0400, Robert Haas wrote: > > Also, in a case like this, I don't think it's unreasonable to ask > > whether, perhaps, Bob just needs to be a little more careful about > > setting search_path. > > That's what this whole thread is about: I wish it was reasonable, but I > don't think the tools we provide today make it reasonable. You expect > Bob to do something like: > > CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ... > > for all functions, not just SECURITY DEFINER functions, is that right? Yes, I do. I think it's self-evident that a SQL function's behavior is not guaranteed to be invariant under all possible values of search_path. If you care about your function behaving the same way all the time, you have to set the search_path. TBH, I don't see any reasonable way around that requirement. We can perhaps provide some safeguards that will make it less likely that you will get completely hosed if your forget, and we could decide to make SET search_path or some mostly-equivalent thing the default at the price of pretty large compatibility break, but you can't have functions that both resolve object references using the caller's search path and also reliably do what the author intended. > > You can, I think, be expected to > > check that functions you define have SET search_path attached. > > We've already established that even postgres hackers are having > difficulty keeping up with these nuances. Even though the SET clause > has been there for a long time, our documentation on the subject is > insufficient and misleading. And on top of that, it's extra typing and > noise for every schema file. Until we make some changes I don't think > we can expect users to do as you suggest. Respectfully, I find this position unreasonable, to the point of finding it difficult to take seriously. You said in another part of your email that I didn't quote here that it's a problem that it's a problem that functions and procedures are created with public execute access by default -- but you can work around this by using a schema to which other users don't have access, or by changing the default permissions for functions on the schema where you are creating them, or by adjusting permissions on the individual objects. If you don't do any of that but don't trust the other users on your system then you at least need to set search_path. If you neither understand how function permissions work nor understand the importance of controlling search_path, you cannot expect to have a secure system with multiple, mutually untrusting users. That's just never going to work, regardless of what the server behavior is. I also disagree with the idea that setting the search_path should be regarded as noise. I think it's quite the opposite. I don't believe that people want to run their functions under a sanitized search_path that only includes system schemas. That might work for some people, but I think most people will define functions that call other functions that they themselves defined, or access tables that they themselves created. They will therefore need the search_path to include the schemas in which they created those objects. There's no way for the system to magically figure out what the user wants here. *Perhaps* if the function is defined interactively the then-current value could be captured, but in a pg_dump for example that won't work, and the configured value, wherever it came from initially, is going to have to be recorded so that it can be recreated when the dump is restored. Most of the problems that we're dealing with here have analogues in the world of shell scripts. A sql or plpgsql function is like a shell script. If it's setuid, i.e. SECURITY DEFINER, you have to worry about the caller hijacking it by setting PATH or IFS or LD_something. Even if it isn't, you have to either trust that the caller has set a reasonable PATH, or set one yourself, else your script isn't always going to work reliably. Nobody really expects to be able to make a setuid shell script secure at all -- that typically requires a wrapper executable -- but it definitely can't be done by someone who doesn't understand the importance of setting their PATH and has no idea how to use chmod. One thing that is quite different between the shell script situation and what we do inside PostgreSQL is that there's a lot more security by default. Every user gets a home directory which by default is accessible only to them, or at the very least writable only by them, and system directories have tightly-controlled permissions. I think UNIX had analogues of a lot of the problems we have today 40 years ago, but they've tightened things up. We've started to move in that direction by, for example, removing public execute access by default. If we want to move further in the direction that UNIX has taken, we should probably get rid of the public schema altogether, and auto-c
Re: Confused about gram.y referencs in Makefile?
Daniel Gustafsson writes: > On 25 Sep 2023, at 05:34, Japin Li wrote: >> How about "See gram.h target's comment in src/backend/parser/Makefile" >> or just "See src/backend/parser/Makefile"? > The latter seems more stable, if the Makefile is ever restructured it's almost > guaranteed that this comment will be missed with the location info becoming > stale. I did it like this: # Note that while each script call produces two output files, to be -# parallel-make safe we need to split this into two rules. (See for -# example gram.y for more explanation.) +# parallel-make safe we need to split this into two rules. (See notes +# in src/backend/parser/Makefile about rules with multiple outputs.) # There are a whole lot of other cross-references to that same comment, and they all look like # See notes in src/backend/parser/Makefile about the following two rules I considered modifying all of those as well, but decided it wasn't really worth the trouble. The Makefiles' days are numbered I think. regards, tom lane
Re: Remove MSVC scripts from the tree
On 2023-09-21 Th 21:12, Michael Paquier wrote: As of today, I can see that the only buildfarm members relying on these scripts are bowerbird and hamerkop, so these two would fail if the patch attached were to be applied today. I am adding Andrew D. and the maintainers of hamerkop (SRA-OSS) in CC for comments. Changing bowerbird to use meson should not be difficult, just needs some TUITs. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Doc: vcregress .bat commands list lacks "taptest"
On 2023-09-25 Mo 03:14, Daniel Gustafsson wrote: On 25 Sep 2023, at 08:58, Michael Paquier wrote: I would say yes, but with a backpatch. Agreed. +1 cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Regression in COPY FROM caused by 9f8377f7a2
On 2023-09-25 Mo 04:59, Laurenz Albe wrote: On Mon, 2023-09-25 at 09:54 +0200, Laurenz Albe wrote: In v16 and later, the following fails: CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); COPY boom FROM STDIN; ERROR: value too long for type character varying(5) In PostgreSQL v15 and earlier, the COPY statement succeeds. The error is thrown in BeginCopyFrom in line 1578 (HEAD) defexpr = expression_planner(defexpr); Bisecting shows that the regression was introduced by commit 9f8377f7a2, which introduced DEFAULT values for COPY FROM. Oops :-( I suggest the attached fix, which evaluates default values only if the DEFAULT option was specified or if the column does not appear in the column list of COPY. Patch looks reasonable, haven't tested yet. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Why need a lock?
I find this in source code #define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */ I think if there is no CONCURRENTLY, so why we need a lock, I think that’s unnecessary. What’s the point?
Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges
in doc/src/sgml/ref/alter_operator.sgml HASHES Indicates this operator can support a hash join. Can only be set when the operator does not support a hash join. MERGES Indicates this operator can support a merge join. Can only be set when the operator does not support a merge join. if the operator cannot support hash/merge join, it can't do ALTER OPERATOR oprname(left_arg, right_arg) SET (HASHES/MERGES = false); I think it means: Can only be set when the operator does support a hash/merge join. Once set to true, it cannot be reset to false.
Re: Synchronizing slots from primary to standby
Hi, On 9/25/23 10:44 AM, Drouvot, Bertrand wrote: Hi, On 9/23/23 3:38 AM, Amit Kapila wrote: On Fri, Sep 22, 2023 at 6:01 PM Drouvot, Bertrand wrote: There is a difference here that we also need to prevent removal of rows required by sync_slots. That could be achieved by physical slot (and hot_standby_feedback). So, having a requirement to have physical slot doesn't sound too unreasonable to me. Otherwise, we need to invent some new mechanism of having some sort of placeholder slot to avoid removal of required rows. Thinking about it, I wonder if removal of required rows is even possible given that: - we don't allow to logical decode from a sync slot - sync slot catalog_xmin <= its primary counter part catalog_xmin - its primary counter part prevents rows removal thanks to its own catalog_xmin - a sync slot is removed as soon as its primary counter part is removed In that case I'm not sure how rows removal on the primary could lead to remove rows required by a sync slot. Am I missing something? Do you have a scenario in mind? Please forget the above questions, it's in fact pretty easy to remove rows on the primary that would be needed by a sync slot. I do agree that having a requirement to have physical slot does not sound unreasonable then. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: bug fix and documentation improvement about vacuumdb
> On 24 Sep 2023, at 10:22, Kuwamura Masaki > wrote: > > LGTM too! I've applied this down to v16 now, thanks for the submission! -- Daniel Gustafsson
RE: Synchronizing slots from primary to standby
Dear Ajin, Shveta, Thank you for rebasing the patch set! Here are new comments for v19_2-0001. 01. WalSndWaitForStandbyNeeded() ``` if (SlotIsPhysical(MyReplicationSlot)) return false; ``` Is there a possibility that physical walsenders call this function? IIUC following is a stacktrace for the function, so the only logical walsenders use it. If so, it should be Assert() instead of an if statement. logical_read_xlog_page() WalSndWaitForWal() WalSndWaitForStandbyNeeded() 02. WalSndWaitForStandbyNeeded() Can we set shouldwait in SlotSyncInitConfig()? synchronize_slot_names_list is searched whenever the function is called, but it is not changed automatically. If the slotname is compared with the list in the SlotSyncInitConfig(), the liner search can be reduced. 03. WalSndWaitForStandbyConfirmation() We should add ProcessRepliesIfAny() during the loop, otherwise the walsender overlooks the death of an apply worker. 04. WalSndWaitForStandbyConfirmation() Not sure, but do we have to return early if walsenders got PROCSIG_WALSND_INIT_STOPPING signal? I thought that if physical walsenders get stuck, logical walsenders wait forever. At that time we cannot stop the primary server even if "pg_ctl stop" is executed. 05. SlotSyncInitConfig() Why don't we free the memory for rawname, old standby_slot_names_list, and synchronize_slot_names_list? They seem to be overwritten. 06. SlotSyncInitConfig() Both physical and logical walsenders call the func, but physical one do not use lists, right? If so, can we add a quick exit for physical walsenders? Or, we should carefully remove where physical calls it. 07. StartReplication() I think we do not have to call SlotSyncInitConfig(). Alternative approach is written in above. 08. the other Also, I found the unexpected behavior after both 0001 and 0002 were applied. Was it normal or not? 1. constructed below setup (ensured that logical slot existed on secondary) 2. stopped the primary 3. promoted the secondary server 4. disabled a subscription once 5. changed the connection string for subscriber 6. Inserted data to new primary 7. enabled the subscription again 8. got an ERROR: replication slot "sub" does not exist I expected that the logical replication would be restarted, but it could not. Was it real issue or my fault? The error would appear in secondary.log. ``` Setup: primary--->secondary | | subscriber ``` Best Regards, Hayato Kuroda FUJITSU LIMITED test_0925.sh Description: test_0925.sh
Re: Index range search optimization
On Mon, Sep 25, 2023 at 12:58 PM Pavel Borisov wrote: > One more doubt about naming. Calling function > _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts, > ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck) > as > (void) _bt_checkkeys(scan, itup, indnatts, dir, > &requiredMatchedByPrecheck, false); > looks little bit misleading because of coincidence of names of 5 and 6 > arguments. I've added the comment clarifying this argument usage. -- Regards, Alexander Korotkov 0001-Skip-checking-of-scan-keys-required-for-direction-v6.patch Description: Binary data
Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)
Em seg., 25 de set. de 2023 às 08:23, Ashutosh Bapat < ashutosh.bapat@gmail.com> escreveu: > On Sat, Sep 23, 2023 at 7:29 PM Ranier Vilela wrote: > > > > Hi, > > > > Per Coverity. > > CID 1518088 (#2 of 2): Improper use of negative value (NEGATIVE_RETURNS) > > > > The function bms_singleton_member can returns a negative number. > > > > /* > > * Get a child rel for rel2 with the relids. See above comments. > > */ > > if (rel2_is_simple) > > { > > int varno = bms_singleton_member(child_relids2); > > > > child_rel2 = find_base_rel(root, varno); > > } > > > > It turns out that in the get_matching_part_pairs function (joinrels.c), > the return of bms_singleton_member is passed to the find_base_rel function, > which cannot receive a negative value. > > > > find_base_rel is protected by an Assertion, which effectively indicates > that the error does not occur in tests and in DEBUG mode. > > > > But this does not change the fact that bms_singleton_member can return a > negative value, which may occur on some production servers. > > > > Fix by changing the Assertion into a real test, to protect the > simple_rel_array array. > > Do you have a scenario where bms_singleton_member() actually returns a > -ve number OR it's just a possibility. Just a possibility. > bms_make_singleton() has an > assertion at the end: Assert(result >= 0); > bms_make_singleton(), bms_add_member() explicitly forbids negative > values. It looks like we have covered all the places which can add a > negative value to a bitmapset. May be we are missing a place or two. > It will be good to investigate it. > I try to do the research, mostly, with runtime compilation. As previously stated, the error does not appear in the tests. That said, although Assert protects in most cases, that doesn't mean it can't occur in a query, running on a server in production mode. Now thinking about what you said about Assertion in bms_make_singleton. I think it's nonsense, no? Why design a function that in DEBUG mode prohibits negative returns, but in runtime mode allows it? After all, why allow a negative return, if for all practical purposes this is prohibited? Regarding the find_base_rel function, it is nonsense to protect the array with Assertion. After all, we have already protected the upper limit with a real test, why not also protect the lower limit. The additional testing is cheap and makes perfect sense, making the function more robust in production mode. As an added bonus, modern compilers will probably be able to remove the additional test if it deems it not necessary. Furthermore, all protections that were added to protect find_base_real calls can eventually be removed, since find_base_real will accept parameters with negative values. best regards, Ranier Vilela
Re: generic plans and "initial" pruning
On Wed, Sep 6, 2023 at 11:20 PM Robert Haas wrote: > On Wed, Sep 6, 2023 at 5:12 AM Amit Langote wrote: > > Attached updated patches. Thanks for the review. > > I think 0001 looks ready to commit. I'm not sure that the commit > message needs to mention future patches here, since this code cleanup > seems like a good idea regardless, but if you feel otherwise, fair > enough. OK, I will remove the mention of future patches. > On 0002, some questions: > > - In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e. > Does that function need any adjustment? I think it does with the patch as it stands. It needs to have an early exit at the top if parentestate is NULL, which it would be if EvalPlanQualInit() wasn't called from an ExecInit*() function. Though, as I answer below your question as to whether there is actually any need to interrupt all of the ExecInit*() routines, nothing needs to change in ExecEndLockRows(). > - In ExecEndMemoize, should there be a null-test around > MemoryContextDelete(node->tableContext) as we have in > ExecEndRecursiveUnion, ExecEndSetOp, etc.? Oops, you're right. Added. > I wonder how we feel about setting pointers to NULL after freeing the > associated data structures. The existing code isn't consistent about > doing that, and making it do so would be a fairly large change that > would bloat this patch quite a bit. On the other hand, I think it's a > good practice as a general matter, and we do do it in some ExecEnd > functions. I agree that it might be worthwhile to take the opportunity and make the code more consistent in this regard. So, I've included those changes too in 0002. > On 0003, I have some doubt about whether we really have all the right > design decisions in detail here: > > - Why have this weird rule where sometimes we return NULL and other > times the planstate? Is there any point to such a coding rule? Why not > just always return the planstate? > > - Is there any point to all of these early exit cases? For example, in > ExecInitBitmapAnd, why exit early if initialization fails? Why not > just plunge ahead and if initialization failed the caller will notice > that and when we ExecEndNode some of the child node pointers will be > NULL but who cares? The obvious disadvantage of this approach is that > we're doing a bunch of unnecessary initialization, but we're also > speeding up the common case where we don't need to abort by avoiding a > branch that will rarely be taken. I'm not quite sure what the right > thing to do is here. > > - The cases where we call ExecGetRangeTableRelation or > ExecOpenScanRelation are a bit subtler ... maybe initialization that > we're going to do later is going to barf if the tuple descriptor of > the relation isn't what we thought it was going to be. In that case it > becomes important to exit early. But if that's not actually a problem, > then we could apply the same principle here also -- don't pollute the > code with early-exit cases, just let it do its thing and sort it out > later. Do you know what the actual problems would be here if we didn't > exit early in these cases? > > - Depending on the answers to the above points, one thing we could > think of doing is put an early exit case into ExecInitNode itself: if > (unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or > someone is going to argue that that checks too often and is thus too > expensive, but it would be a lot more maintainable than having similar > checks strewn throughout the ExecInit* functions. Perhaps it deserves > some thought/benchmarking. More generally, if there's anything we can > do to centralize these checks in fewer places, I think that would be > worth considering. The patch isn't terribly large as it stands, so I > don't necessarily think that this is a critical issue, but I'm just > wondering if we can do better. I'm not even sure that it would be too > expensive to just initialize the whole plan always, and then just do > one test at the end. That's not OK if the changed tuple descriptor (or > something else) is going to crash or error out in a funny way or > something before initialization is completed, but if it's just going > to result in burning a few CPU cycles in a corner case, I don't know > if we should really care. I thought about this some and figured that adding the is-CachedPlan-still-valid tests in the following places should suffice after all: 1. In InitPlan() right after the top-level ExecInitNode() calls 2. In ExecInit*() functions of Scan nodes, right after ExecOpenScanRelation() calls CachedPlans can only become invalid because of concurrent changes to the inheritance child tables referenced in the plan. Only the following schema modifications of child tables are possible to be performed concurrently: * Addition of a column (allowed only if traditional inheritance child) * Addition of an index * Addition of a non-index constraint * Dropping of a child table (allowed only if traditional i
Re: On login trigger: take three
> On 25 Sep 2023, at 11:13, Alexander Korotkov wrote: > I'd like to do a short summary of > design issues on this thread. Thanks for summarizing this long thread! > the patch for the GUC option to disable > all event triggers resides in a separate thread [4]. Apparently that > patch should be committed first [5]. I have committed the prerequisite patch for temporarily disabling EVTs via a GUC in 7750fefdb2. We should absolutely avoid creating any more dependencies on single-user mode (yes, I have changed my mind since the beginning of the thread). > 3. Yet another question is connection-time overhead introduced by this > patch. The overhead estimate varies from no measurable overhead [6] to > 5% overhead [7]. In order to overcome that, [8] has introduced a > database-level flag indicating whether there are connection triggers. > Later this flag was removed [9] in a hope that the possible overhead > is acceptable. While I disliked the flag, I do think the overhead is problematic. Last time I profiled it I found it noticeable, and it seems expensive for such a niche feature to impact such a hot path. Maybe you can think of other ways to reduce the cost here (if it indeed is an issue in the latest version of the patch, which is not one I've benchmarked)? > 5. It was also pointed out [11] that ^C in psql doesn't cancel > long-running client connection triggers. That might be considered a > psql problem though. While it is a psql problem, it's exacerbated by a slow login EVT and it breaks what I would guess is the mental model of many who press ^C in a stalled login. At the very least we should probably document the risk? -- Daniel Gustafsson
Re: GUC for temporarily disabling event triggers
> On 25 Sep 2023, at 09:52, Daniel Gustafsson wrote: > >> On 25 Sep 2023, at 09:50, Michael Paquier wrote: >> >> On Mon, Sep 25, 2023 at 09:19:35AM +0200, Daniel Gustafsson wrote: >>> Fair enough, although I used "fire" instead of "run" which is consistent >>> with >>> the event trigger documentation. >> >> Okay by me. > > Great, I'll go ahead and apply this version then. Thanks for review! And applied, closing the CF entry. -- Daniel Gustafsson
Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector
> On 25 Sep 2023, at 14:00, Karl O. Pinc wrote: > Is there a preferred data format or should I send > each patch in a separate attachment with description? The easiest way would be to create a patchset off of master I think. In a branch, commit each change with an explanatory commit message. Once done you can do "git format-patch origin/master -v 1" which will generate a set of n patches named v1-0001 through v1-000n. You can then attache those to the thread. This will make it easier for a reviewer, and it's easy to apply them in the right order in case one change depends on another earlier change. -- Daniel Gustafsson
Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector
On Mon, 25 Sep 2023 09:26:38 +0200 Daniel Gustafsson wrote: > > On 25 Sep 2023, at 00:57, Karl O. Pinc wrote: > > > I have made various, mostly unrelated to each other, > > small improvements to the documentation. > While I agree it's subjective, I don't think adding a new section or > paragraph qualifies as short or small. I would prefer if each > (related) change is in a single commit with a commit message which > describes the motivation for the change. A reviewer can second-guess > the rationale for the changes, but they shouldn't have to. Will do. Is there a preferred data format or should I send each patch in a separate attachment with description? > The resulting patchset can all be in the same thread though. Thanks. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs
Hi, I attached the second version of the patch. On Mon, 11 Sept 2023 at 15:11, Daniel Gustafsson wrote: > > > On 11 Sep 2023, at 13:03, Nazir Bilal Yavuz wrote: > > >> Almost, but not entirely. There are a set of scripts which generate > >> content > >> for the docs based on files in src/, like > >> src/backend/catalog/sql_features.txt > >> and src/include/parser/kwlist.h. If those source files change, or their > >> scripts, it would be helpful to build docs. > > > > Thanks! Are these the only files that are not in the doc subfolders > > but effect docs? > > I believe so, the list of scripts and input files can be teased out of the > doc/src/sgml/meson.build file. Now the files mentioned in the doc/src/sgml/meson.build file will trigger building the docs task. Also, if the changes are only in the README files, CI will be skipped completely. Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From 843d0c132c0f95dfee50eaaf667a92fffe400eeb Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 11 Sep 2023 13:29:33 +0300 Subject: [PATCH v2 2/2] Skip CI if changes are only in docs or in the READMEs When changes are only in docs or in the READMEs, skip all the tasks. This skip is overriden in 'SanityCheck' and 'Build the Docs' tasks. So, these two tasks might have not been skipped. --- .cirrus.tasks.yml | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index d7c45224af9..ec6bcd5af70 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -3,6 +3,11 @@ # For instructions on how to enable the CI integration in a repository and # further details, see src/tools/ci/README +# When changes are only in docs or in the READMEs, skip all the tasks. +# +# This skip is overriden in 'SanityCheck' and 'Build the Docs' tasks. So, +# these two tasks might have not been skipped. +skip: changesIncludeOnly('doc/**', '**/README') env: # The lower depth accelerates git clone. Use a bit of depth so that @@ -54,12 +59,12 @@ on_failure_meson: &on_failure_meson # broken commits, have a minimal task that all others depend on. task: name: SanityCheck - - # If a specific OS is requested, don't run the sanity check. This shortens + # If a specific OS is requested or if there are changes only in the docs + # or in the READMEs, don't run the sanity check. This shortens # push-wait-for-ci cycle time a bit when debugging operating system specific # failures. Uses skip instead of only_if, as cirrus otherwise warns about # only_if conditions not matching. - skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' + skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' || changesIncludeOnly('doc/**', '**/README') env: CPUS: 4 -- 2.40.1 From 31fcd0c5b588e9c4ef968e7a92489cfb86740228 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 7 Sep 2023 17:58:06 +0300 Subject: [PATCH v2 1/2] Only built the docs if there are changes are in the docs Building the docs triggered although there are no changes in the docs. So, the new 'Building the Docs' task is created. This task only run if a specific OS is not requested and if there are changes in docs or in the CI files. --- .cirrus.tasks.yml | 77 ++- 1 file changed, 62 insertions(+), 15 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..d7c45224af9 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -740,21 +740,6 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - ### - # Verify docs can be built - ### - # XXX: Only do this if there have been changes in doc/ since last build - always: -docs_build_script: | - time ./configure \ ---cache gcc.cache \ -CC="ccache gcc" \ -CXX="ccache g++" \ -CLANG="ccache clang" - make -s -j${BUILD_JOBS} clean - time make -s -j${BUILD_JOBS} -C doc - - ### # Verify headerscheck / cpluspluscheck succeed # # - Don't use ccache, the files are uncacheable, polluting ccache's @@ -777,3 +762,65 @@ task: always: upload_caches: ccache + + +task: + name: Build the Docs + depends_on: SanityCheck + # Only run if a specific OS is not requested and if there are changes in docs + # or in the CI files. + skip: > +$CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' || +!changesInclude('doc/**', +'.cirrus.yml', +'.cirrus.tasks.yml', +'src/backend/catalog/sql_feature_packages.txt', +'src/backend/catalog/sql_features.txt', +'src/backend/utils/errcodes.txt', +'src/backend/utils/activity/wait_event_names.txt', +'src/backend/utils/activity/generate-wait_event_types.pl', +'src/include/parser/kwlist.h') + + env: +CPUS: 4 +BUILD_JOBS: 4 +IMAGE_FAMILY: pg-ci-bullseye +
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy wrote: > > On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar wrote: > > > > > > Is there anything else that stops this patch from supporting migration > > > > of logical replication slots from PG versions < 17? > > > > > > IMHO one of the main change we are doing in PG 17 is that on shutdown > > > checkpoint we are ensuring that if the confirmed flush lsn is updated > > > since the last checkpoint and that is not yet synched to the disk then > > > we are doing so. I think this is the most important change otherwise > > > many slots for which we have already streamed all the WAL might give > > > an error assuming that there are pending WAL from the slots which are > > > not yet confirmed. > > > > > > > You might need to refer to [1] for the change I am talking about > > > > [1] > > https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com > > I see. IIUC, without that commit e0b2eed [1], it may happen that the > slot's on-disk confirmed_flush LSN value can be higher than the WAL > LSN that's flushed to disk, no? > No, without that commit, there is a very high possibility that even if we have sent the WAL to the subscriber and got the acknowledgment of the same, we would miss updating it before shutdown. This would lead to upgrade failures because upgrades have no way to later identify whether the remaining WAL records are sent to the subscriber. > If so, can't it be detected if the WAL > at confirmed_flush LSN is valid or not when reading WAL with > xlogreader machinery? > > What if the commit e0b2eed [1] is treated to be fixing a bug with the > reasoning [2] and backpatch? When done so, it's easy to support > upgradation/migration of logical replication slots from PG versions < > 17, no? > Yeah, we could try to make a case to backpatch it but when I raised that point there was not much consensus on backpatching it. We are aware and understand that if we could backpatch it then the prior version slots be upgraded but the case to backpatch needs broader consensus. For now, the idea is to get the core of the functionality to be committed and then we can see if we get the consensus on backpatching the commit you mentioned and probably changing the version checks in this work. -- With Regards, Amit Kapila.
Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)
On Sat, Sep 23, 2023 at 7:29 PM Ranier Vilela wrote: > > Hi, > > Per Coverity. > CID 1518088 (#2 of 2): Improper use of negative value (NEGATIVE_RETURNS) > > The function bms_singleton_member can returns a negative number. > > /* > * Get a child rel for rel2 with the relids. See above comments. > */ > if (rel2_is_simple) > { > int varno = bms_singleton_member(child_relids2); > > child_rel2 = find_base_rel(root, varno); > } > > It turns out that in the get_matching_part_pairs function (joinrels.c), the > return of bms_singleton_member is passed to the find_base_rel function, which > cannot receive a negative value. > > find_base_rel is protected by an Assertion, which effectively indicates that > the error does not occur in tests and in DEBUG mode. > > But this does not change the fact that bms_singleton_member can return a > negative value, which may occur on some production servers. > > Fix by changing the Assertion into a real test, to protect the > simple_rel_array array. Do you have a scenario where bms_singleton_member() actually returns a -ve number OR it's just a possibility. bms_make_singleton() has an assertion at the end: Assert(result >= 0); bms_make_singleton(), bms_add_member() explicitly forbids negative values. It looks like we have covered all the places which can add a negative value to a bitmapset. May be we are missing a place or two. It will be good to investigate it. What you are suggesting may be ok, as is as well. -- Best Wishes, Ashutosh Bapat
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Bharath, Thank you for giving comments! Before addressing your comments, I wanted to reply some of them. > 4. > +/* > + * There is a possibility that following records may be generated > + * during the upgrade. > + */ > +is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, > XLOG_CHECKPOINT_SHUTDOWN) || > +is_xlog_record_type(rmid, info, RM_XLOG_ID, > XLOG_CHECKPOINT_ONLINE) || > +is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) || > +is_xlog_record_type(rmid, info, RM_XLOG_ID, > XLOG_FPI_FOR_HINT) || > +is_xlog_record_type(rmid, info, RM_XLOG_ID, > XLOG_PARAMETER_CHANGE) || > +is_xlog_record_type(rmid, info, RM_STANDBY_ID, > XLOG_RUNNING_XACTS) || > +is_xlog_record_type(rmid, info, RM_HEAP2_ID, > XLOG_HEAP2_PRUNE); > > What if we missed to capture the WAL records that may be generated > during upgrade? If such records are generated before calling binary_upgrade_validate_wal_logical_end(), the upgrading would fail. Otherwise it would be succeeded. Anyway, we don't care such records because those aren't required to be replicated. The main thing we want to detect is that we don't miss any record generated before server shutdown. > > What happens if a custom WAL resource manager generates table/index AM > WAL records during upgrade? If such records are found, definitely we cannot distinguish whether it is acceptable. We do not have a way to know the property of custom WALs. We didn't care as there are other problems in the approach, if such a facility is invoked. Please see the similar discussion [1]. > > What happens if new WAL records are added that may be generated during > the upgrade? Isn't keeping this code extensible and in sync with > future changes a problem? Actually, others also pointed out the similar point. Originally we just checked confirmed_flush_lsn and "latest checkpoint lsn" reported by pg_controldata, but found an issue what the upgrading cannot be passed if users do pg_upgrade --check just before the actual upgrade. Then we discussed some idea but they have some disadvantages, so we settled on the current idea. Here is a summary which describes current situation it would be quite helpful [2] (maybe you have already known). > Or we'd better say that any custom WAL > records are found after the slot's confirmed flush LSN, then the slot > isn't upgraded? After concluding how we ensure, we can add the sentence accordingly. > > 5. In continuation to the above comment: > > Why can't this logic be something like - if there's any WAL record > seen after a slot's confirmed flush LSN is of type generated by WAL > resource manager having the rm_decode function defined, then the slot > can't be upgraded. Thank you for giving new approach! We have never seen the approach before, but at least XLOG and HEAP2 rmgr have a decode function. So that XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and XLOG_HEAP2_PRUNE cannot be ignored the approach, seems not appropriate. If you have another approach, I'm very happy if you post. [1]: https://www.postgresql.org/message-id/ZNZ4AxUMIrnMgRbo%40momjian.us [2]: https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges
/* * AlterOperator * routine implementing ALTER OPERATOR SET (option = ...). * * Currently, only RESTRICT and JOIN estimator functions can be changed. */ ObjectAddress AlterOperator(AlterOperatorStmt *stmt) The above comment needs to change, other than that, it passed the test, works as expected.
Re: Make --help output fit within 80 columns per line
On 2023-09-25 15:27, torikoshia wrote: On Tue, Sep 19, 2023 at 3:23 AM Greg Sabino Mullane wrote: Thanks for your investigation! On Fri, Sep 15, 2023 at 11:11 AM torikoshia wrote: I do not intend to adhere to this rule(my terminals are usually bigger than 80 chars per line), but wouldn't it be a not bad direction to use 80 characters for all commands? Well, that's the question du jour, isn't it? The 80 character limit is based on punch cards, and really has no place in modern systems. While gnu systems are stuck in the past, many other ones have moved on to more sensible defaults: $ wget --help | wc -L 110 $ gcloud --help | wc -L 122 $ yum --help | wc -L 122 git is an interesting one, as they force things through a pager for their help, but if you look at their raw help text files, they have plenty of times they go past 80 when needed: $ wc -L git/Documentation/git-*.txt | sort -g | tail -20 109 git-filter-branch.txt 109 git-rebase.txt 116 git-diff-index.txt 116 git-http-fetch.txt 117 git-restore.txt 122 git-checkout.txt 122 git-ls-tree.txt 129 git-init-db.txt 131 git-push.txt 132 git-update-ref.txt 142 git-maintenance.txt 144 git-interpret-trailers.txt 146 git-cat-file.txt 148 git-repack.txt 161 git-config.txt 162 git-notes.txt 205 git-stash.txt 251 git-submodule.txt So in summary, I think 80 is a decent soft limit, but let's not stress out about some lines going over that, and make a hard limit of perhaps 120. +1. It may be a good compromise. For enforcing the hard limit, is it better to add a regression test like patch 0001? Ugh, regression tests failed and it appears to be due to reasons related to meson. I'm going to investigate it. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Index range search optimization
Sorry, I've mistaken with attached version previously. Correct v5 attached. On Mon, 25 Sept 2023 at 13:58, Pavel Borisov wrote: > > Hi, Alexander! > > I found and fixed a couple of naming issues that came to v4 from > earlier patches. > Also, I added initialization of requiredMatchedByPrecheck in case of first > page. > > Please see patch v5. > > One more doubt about naming. Calling function > _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts, > ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck) > as > (void) _bt_checkkeys(scan, itup, indnatts, dir, > &requiredMatchedByPrecheck, false); > looks little bit misleading because of coincidence of names of 5 and 6 > arguments. v5-0001-PATCH-Skip-checking-of-scan-keys-required-for-dir.patch Description: Binary data
Fix a wrong comment in setrefs.c
I noticed a wrong comment in search_indexed_tlist_for_sortgroupref(). foreach(lc, itlist->tlist) { TargetEntry *tle = (TargetEntry *) lfirst(lc); /* The equal() check should be redundant, but let's be paranoid */ if (tle->ressortgroupref == sortgroupref && equal(node, tle->expr)) { It turns out that the equal() check is necessary, because the given sort/group expression might be type of FuncExpr which converts integer to numeric. In this case we need to modify its args not itself to reference the matching subplan output expression. As an example, consider explain (costs off, verbose) SELECT 1.1 AS two UNION (SELECT 2 UNION ALL SELECT 2); QUERY PLAN - HashAggregate Output: (1.1) Group Key: (1.1) -> Append -> Result Output: 1.1 -> Result Output: (2) -> Append -> Result Output: 2 -> Result Output: 2 (13 rows) If we remove the equal() check, this query would cause crash in execution. I'm considering changing the comment as below. - /* The equal() check should be redundant, but let's be paranoid */ + /* + * The equal() check is necessary, because expressions with the same + * sortgroupref might be different, e.g., the given sort/group + * expression can be type of FuncExpr which converts integer to + * numeric, and we need to modify its args not itself to reference the + * matching subplan output expression in this case. + */ Any thoughts? Thanks Richard v1-0001-Fix-a-wrong-comment-in-setrefs.c.patch Description: Binary data
Re: Is this a problem in GenericXLogFinish()?
On 22/09/2023 23:52, Jeff Davis wrote: src/backend/transam/README says: ... 4. Mark the shared buffer(s) as dirty with MarkBufferDirty(). (This must happen before the WAL record is inserted; see notes in SyncOneBuffer().) ... But GenericXLogFinish() does this: ... /* Insert xlog record */ lsn = XLogInsert(RM_GENERIC_ID, 0); /* Set LSN and mark buffers dirty */ for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++) { PageData *pageData = &state->pages[i]; if (BufferIsInvalid(pageData->buffer)) continue; PageSetLSN(BufferGetPage(pageData->buffer), lsn); MarkBufferDirty(pageData->buffer); } END_CRIT_SECTION(); Am I missing something or is that a problem? Yes, that's a problem. I wish we had an assertion for that. XLogInsert() could assert that the page is already marked dirty, for example. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Index range search optimization
Hi, Alexander! I found and fixed a couple of naming issues that came to v4 from earlier patches. Also, I added initialization of requiredMatchedByPrecheck in case of first page. Please see patch v5. One more doubt about naming. Calling function _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts, ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck) as (void) _bt_checkkeys(scan, itup, indnatts, dir, &requiredMatchedByPrecheck, false); looks little bit misleading because of coincidence of names of 5 and 6 arguments. 0001-Skip-checking-of-scan-keys-required-for-direction-v4.patch Description: Binary data
Re: RFC: Logging plan of the running query
On 25/9/2023 14:21, torikoshia wrote: On 2023-09-20 14:39, Lepikhov Andrei wrote: Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GUC option for skipping shared buffers in core dumps
Hi, The current approach could be better because we want to use it on Windows/MacOS and other systems. So, I've tried to develop another strategy - detaching shmem and DSM blocks before executing the abort() routine. As I can see, it helps and reduces the size of the core file. -- regards, Andrey Lepikhov Postgres Professional diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 5810f8825e..4d7bf2c0e4 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -325,7 +325,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) { SetProcessingMode(NormalProcessing); CheckerModeMain(); - abort(); + pg_abort(); } /* diff --git a/src/backend/main/main.c b/src/backend/main/main.c index ed11e8be7f..34ac874ad0 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -197,7 +197,7 @@ main(int argc, char *argv[]) else PostmasterMain(argc, argv); /* the functions above should not return */ - abort(); + pg_abort(); } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 54e9bfb8c4..fc32a6bb1b 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1469,7 +1469,7 @@ PostmasterMain(int argc, char *argv[]) */ ExitPostmaster(status != STATUS_OK); - abort();/* not reached */ + pg_abort(); /* not reached */ } diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e250b0567e..3123b388ab 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -385,7 +385,7 @@ WalSndShutdown(void) whereToSendOutput = DestNone; proc_exit(0); - abort();/* keep the compiler quiet */ + pg_abort(); /* keep the compiler quiet */ } /* diff --git a/src/backend/utils/error/assert.c b/src/backend/utils/error/assert.c index 719dd7b309..f422d42440 100644 --- a/src/backend/utils/error/assert.c +++ b/src/backend/utils/error/assert.c @@ -19,6 +19,32 @@ #include #endif +#include +#include + +int core_dump_no_shared_buffers = COREDUMP_INCLUDE_ALL; + +/* + * Remember, at the same time someone can work with shared memory, write them to + * disk and so on. + */ +void +pg_abort(void) +{ + if (core_dump_no_shared_buffers != COREDUMP_INCLUDE_ALL) + { + if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL || + core_dump_no_shared_buffers == COREDUMP_EXCLUDE_DSM) + dsm_detach_all(); + + if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL || + core_dump_no_shared_buffers == COREDUMP_EXCLUDE_SHMEM) + PGSharedMemoryDetach(); + } + + abort(); +} + /* * ExceptionalCondition - Handles the failure of an Assert() * @@ -63,5 +89,5 @@ ExceptionalCondition(const char *conditionName, sleep(100); #endif - abort(); + pg_abort(); } diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 8e1f3e8521..f6c863ca68 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -601,7 +601,7 @@ errfinish(const char *filename, int lineno, const char *funcname) * children... */ fflush(NULL); - abort(); + pg_abort(); } /* diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index bdb26e2b77..95e205e8d1 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -427,6 +427,14 @@ static const struct config_enum_entry debug_logical_replication_streaming_option {NULL, 0, false} }; +static const struct config_enum_entry core_dump_no_shared_buffers_options[] = { + {"none", COREDUMP_INCLUDE_ALL, false}, + {"shmem", COREDUMP_EXCLUDE_SHMEM, false}, + {"dsm", COREDUMP_EXCLUDE_DSM, false}, + {"all", COREDUMP_EXCLUDE_ALL, false}, + {NULL, 0, false} +}; + StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 2), "array length mismatch"); @@ -4971,6 +4979,17 @@ struct config_enum ConfigureNamesEnum[] = NULL, NULL, NULL }, + { + {"core_dump_no_shared_buffers", PGC_POSTMASTER, DEVELOPER_OPTIONS, + NULL, + NULL, + GUC_NOT_IN_SAMPLE + }, + &core_dump_no_shared_buffers, + COREDUMP_INCLUDE_ALL, core_dump_no_shared_buffers_options, + NULL, NULL
Re: On login trigger: take three
Hi! On Wed, Jun 14, 2023 at 10:49 PM Mikhail Gribkov wrote: > The attached v40 patch is a fresh rebase on master branch to actualize the > state before the upcoming commitfest. > Nothing has changed beyond rebasing. > > And just for convenience, here is a link to the exact message of the thread > describing the current approach: > https://www.postgresql.org/message-id/CAMEv5_vg4aJOoUC74XJm%2B5B7%2BTF1nT-Yhtg%2BawtBOESXG5Grfg%40mail.gmail.com Thank you for the update. I think the patch is interesting and demanding. The code, docs and tests seem to be quite polished already. Simultaneously, the thread is long and spread over a long time. So, it's easy to lose track. I'd like to do a short summary of design issues on this thread. 1. Initially the patch introduced "on login" triggers. It was unclear if they need to rerun on RESET ALL or DISCARD ALL [1]. The new name is "client connection" trigger, which seems fine [2]. 2. Another question is how to deal with triggers, which hangs or fails with error [1]. One possible way to workaround that is single-user mode, which is already advised to workaround the errors in other event triggers. However, in perspective single-user mode might be deleted. Also, single-user mode is considered as a worst-case scenario recovery tool, while it's very easy to block the database connections with client connection triggers. As addition/alternative to single-user mode, GUC options to disable all event triggers and/or client connection triggers. Finally, the patch for the GUC option to disable all event triggers resides in a separate thread [4]. Apparently that patch should be committed first [5]. 3. Yet another question is connection-time overhead introduced by this patch. The overhead estimate varies from no measurable overhead [6] to 5% overhead [7]. In order to overcome that, [8] has introduced a database-level flag indicating whether there are connection triggers. Later this flag was removed [9] in a hope that the possible overhead is acceptable. 4. [10] points that there is no clean way to store information about unsuccessful connections (declined by either authentication or trigger). However, this is considered out-of-scope for the current patch, and could be implemented later if needed. 5. It was also pointed out [11] that ^C in psql doesn't cancel long-running client connection triggers. That might be considered a psql problem though. 6. It has been also pointed out that [12] all triggers, which write data to the database, must check pg_is_in_recovery() to work correctly on standby. That seems to be currently reflected in the documentation. So, for me the open issues seem to be 2, 3 and 5. My plan to revive this patch is to commit the GUC patch [4], recheck the overhead and probably leave "^C in psql" problem as a separate standalone issue. Any thoughts? Links. 1. https://www.postgresql.org/message-id/CAFj8pRBdqdqvkU3mVKzoOnO+jPz-6manRV47CDEa+1jD6x6LFg%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAFj8pRCxdQgHy8Mynk3hz6pFsqQ9BN6Vfgy0MJLtQBAUhWDf3w%40mail.gmail.com 3. https://www.postgresql.org/message-id/E0D5DC61-C490-45BD-A984-E8D56493EC4F%40yesql.se 4. https://www.postgresql.org/message-id/9140106e-f9bf-4d85-8fc8-f2d3c094a...@yesql.se 5. https://www.postgresql.org/message-id/20230306221010.gszjoakt5jp7oqpd%40awork3.anarazel.de 6. https://www.postgresql.org/message-id/90760f2d-2f9c-12ab-d2c5-e8e6fb7d08de%40postgrespro.ru 7. https://www.postgresql.org/message-id/CAFj8pRChwu01VLx76nKBVyScHCsd1YnBGiKfDJ6h17g4CSnUBg%40mail.gmail.com 8. https://www.postgresql.org/message-id/4471d472-5dfc-f2b0-ad05-0ff8d0a3bb0c%40postgrespro.ru 9. https://www.postgresql.org/message-id/CAMEv5_vg4aJOoUC74XJm%2B5B7%2BTF1nT-Yhtg%2BawtBOESXG5Grfg%40mail.gmail.com 10. https://www.postgresql.org/message-id/9c897136-4755-dcfc-2d24-b12bcfe4467f%40sigaev.ru 11. https://www.postgresql.org/message-id/CA%2BTgmoZv9f1s797tihx-zXQN4AE4ZFBV5C0K%3DzngbgNu3xNNkg%40mail.gmail.com 12. https://www.postgresql.org/message-id/20220312024652.lvgehszwke4hhove%40alap3.anarazel.de -- Regards, Alexander Korotkov
Re: Regression in COPY FROM caused by 9f8377f7a2
On Mon, 2023-09-25 at 09:54 +0200, Laurenz Albe wrote: > In v16 and later, the following fails: > > CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); > > COPY boom FROM STDIN; > ERROR: value too long for type character varying(5) > > In PostgreSQL v15 and earlier, the COPY statement succeeds. > > The error is thrown in BeginCopyFrom in line 1578 (HEAD) > > defexpr = expression_planner(defexpr); > > Bisecting shows that the regression was introduced by commit 9f8377f7a2, > which introduced DEFAULT values for COPY FROM. I suggest the attached fix, which evaluates default values only if the DEFAULT option was specified or if the column does not appear in the column list of COPY. Yours, Laurenz Albe From 4af982c56df57a1a0f04340d394c297559fbabb5 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Mon, 25 Sep 2023 10:56:15 +0200 Subject: [PATCH] Evaluate defaults in COPY FROM only if necessary Since commit 9f8377f7a2, we evaluate the column default values in COPY FROM for all columns except generated ones, because they could be needed if the input value matches the DEFAULT option. This can cause a surprising regression: CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); COPY boom FROM STDIN; ERROR: value too long for type character varying(5) This worked before 9f8377f7a2, since default values were only evaluated for columns that were not specified in the column list. To fix, fetch the default values only if the DEFAULT option was specified or for columns not specified in the column list. --- src/backend/commands/copyfrom.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 70871ed819..320b764aa9 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1567,7 +1567,14 @@ BeginCopyFrom(ParseState *pstate, /* Get default info if available */ defexprs[attnum - 1] = NULL; - if (!att->attgenerated) + /* + * We need the default values only for columns that do not appear in the + * column list or if the DEFAULT option was given. We also don't need + * it for generated columns. + */ + if ((!list_member_int(cstate->attnumlist, attnum) || + cstate->opts.default_print != NULL) && + !att->attgenerated) { Expr *defexpr = (Expr *) build_column_default(cstate->rel, attnum); -- 2.41.0
Re: Failures on gombessa -- EIO?
Hi Thomas, I will check it today. Regards Nikola On Sat, 23 Sept 2023 at 04:54, Thomas Munro wrote: > I am not sure why REL_16_STABLE fails consistently as of ~4 days ago. > It seems like bad storage or something? Just now it happened also on > HEAD. I wonder why it would be sensitive to the branch. >
Re: Synchronizing slots from primary to standby
Hi, On 9/23/23 3:38 AM, Amit Kapila wrote: On Fri, Sep 22, 2023 at 6:01 PM Drouvot, Bertrand wrote: Thanks for all the work that has been done on this feature, and sorry to have been quiet on it for so long. On 9/18/23 12:22 PM, shveta malik wrote: On Wed, Sep 13, 2023 at 4:48 PM Hayato Kuroda (Fujitsu) wrote: Right, but I wanted to know why it is needed. One motivation seemed to know the WAL location of physical standby, but I thought that struct WalSnd.apply could be also used. Is it bad to assume that the physical walsender always exists? We do not plan to target this case where physical slot is not created between primary and physical-standby in the first draft. In such a case, slot-synchronization will be skipped for the time being. We can extend this functionality (if needed) later. I do think it's needed to extend this functionality. Having physical slot created sounds like a (too?) strong requirement as: - It has not been a requirement for Logical decoding on standby so that could sounds weird to require it for sync slot (while it's not allowed to logical decode from sync slots) There is a difference here that we also need to prevent removal of rows required by sync_slots. That could be achieved by physical slot (and hot_standby_feedback). So, having a requirement to have physical slot doesn't sound too unreasonable to me. Otherwise, we need to invent some new mechanism of having some sort of placeholder slot to avoid removal of required rows. Thinking about it, I wonder if removal of required rows is even possible given that: - we don't allow to logical decode from a sync slot - sync slot catalog_xmin <= its primary counter part catalog_xmin - its primary counter part prevents rows removal thanks to its own catalog_xmin - a sync slot is removed as soon as its primary counter part is removed In that case I'm not sure how rows removal on the primary could lead to remove rows required by a sync slot. Am I missing something? Do you have a scenario in mind? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy wrote: > > On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar wrote: > > > > > > Is there anything else that stops this patch from supporting migration > > > > of logical replication slots from PG versions < 17? > > > > > > IMHO one of the main change we are doing in PG 17 is that on shutdown > > > checkpoint we are ensuring that if the confirmed flush lsn is updated > > > since the last checkpoint and that is not yet synched to the disk then > > > we are doing so. I think this is the most important change otherwise > > > many slots for which we have already streamed all the WAL might give > > > an error assuming that there are pending WAL from the slots which are > > > not yet confirmed. > > > > > > > You might need to refer to [1] for the change I am talking about > > > > [1] > > https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com > > I see. IIUC, without that commit e0b2eed [1], it may happen that the > slot's on-disk confirmed_flush LSN value can be higher than the WAL > LSN that's flushed to disk, no? If so, can't it be detected if the WAL > at confirmed_flush LSN is valid or not when reading WAL with > xlogreader machinery? Actually, without this commit the slot's "confirmed_flush LSN" value in memory can be higher than the disk because if you notice this function LogicalConfirmReceivedLocation(), if we change only the confirmed flush the slot is not marked dirty that means on shutdown the slot will not be persisted to the disk. But logically this will not cause any issue so we can not treat it as a bug it may cause us to process some extra records after the restart but that is not really a bug. > What if the commit e0b2eed [1] is treated to be fixing a bug with the > reasoning [2] and backpatch? When done so, it's easy to support > upgradation/migration of logical replication slots from PG versions < > 17, no? Maybe this could be backpatched in order to support this upgrade from the older version but not as a bug fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pipe_read_line for reading arbitrary strings
> On 4 Jul 2023, at 14:50, Daniel Gustafsson wrote: > >> On 4 Jul 2023, at 13:59, Heikki Linnakangas wrote: >> On 08/03/2023 00:05, Daniel Gustafsson wrote: > >>> If we are going to continue using this for reading $stuff from pipes, maybe >>> we >>> should think about presenting a nicer API which removes that risk? >>> Returning >>> an allocated buffer which contains all the output along the lines of the >>> recent >>> pg_get_line work seems a lot nicer and safer IMO. >> >> +1 > > Thanks for review! > >>> /* >>> * Execute a command in a pipe and read the first line from it. The returned >>> * string is allocated, the caller is responsible for freeing. >>> */ >>> char * >>> pipe_read_line(char *cmd) >> >> I think it's worth being explicit here that it's palloc'd, or malloc'd in >> frontend programs, rather than just "allocated". Like in pg_get_line. > > Good point, I'll make that happen before committing this. Fixed, along with commit message wordsmithing in the attached. Unless objected to I'll go ahead with this version. -- Daniel Gustafsson v3-0001-Refactor-pipe_read_line-to-return-the-full-line.patch Description: Binary data
Regression in COPY FROM caused by 9f8377f7a2
In v16 and later, the following fails: CREATE TABLE boom (t character varying(5) DEFAULT 'a long string'); COPY boom FROM STDIN; ERROR: value too long for type character varying(5) In PostgreSQL v15 and earlier, the COPY statement succeeds. The error is thrown in BeginCopyFrom in line 1578 (HEAD) defexpr = expression_planner(defexpr); Bisecting shows that the regression was introduced by commit 9f8377f7a2, which introduced DEFAULT values for COPY FROM. The table definition is clearly silly, so I am not sure if that regression is worth fixing. On the other hand, it is not cool if something that worked without an error in v15 starts to fail later on. Yours, Laurenz Albe
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar wrote: > > > > Is there anything else that stops this patch from supporting migration > > > of logical replication slots from PG versions < 17? > > > > IMHO one of the main change we are doing in PG 17 is that on shutdown > > checkpoint we are ensuring that if the confirmed flush lsn is updated > > since the last checkpoint and that is not yet synched to the disk then > > we are doing so. I think this is the most important change otherwise > > many slots for which we have already streamed all the WAL might give > > an error assuming that there are pending WAL from the slots which are > > not yet confirmed. > > > > You might need to refer to [1] for the change I am talking about > > [1] > https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com I see. IIUC, without that commit e0b2eed [1], it may happen that the slot's on-disk confirmed_flush LSN value can be higher than the WAL LSN that's flushed to disk, no? If so, can't it be detected if the WAL at confirmed_flush LSN is valid or not when reading WAL with xlogreader machinery? What if the commit e0b2eed [1] is treated to be fixing a bug with the reasoning [2] and backpatch? When done so, it's easy to support upgradation/migration of logical replication slots from PG versions < 17, no? [1] commit e0b2eed047df9045664da6f724cb42c10f8b12f0 Author: Amit Kapila Date: Thu Sep 14 08:56:13 2023 +0530 Flush logical slots to disk during a shutdown checkpoint if required. [2] It can also help avoid processing the same transactions again in some boundary cases after the clean shutdown and restart. Say, we process some transactions for which we didn't send anything downstream (the changes got filtered) but the confirm_flush LSN is updated due to keepalives. As we don't flush the latest value of confirm_flush LSN, it may lead to processing the same changes again without this patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: GUC for temporarily disabling event triggers
> On 25 Sep 2023, at 09:50, Michael Paquier wrote: > > On Mon, Sep 25, 2023 at 09:19:35AM +0200, Daniel Gustafsson wrote: >> Fair enough, although I used "fire" instead of "run" which is consistent with >> the event trigger documentation. > > Okay by me. Great, I'll go ahead and apply this version then. Thanks for review! -- Daniel Gustafsson
Re: GUC for temporarily disabling event triggers
On Mon, Sep 25, 2023 at 09:19:35AM +0200, Daniel Gustafsson wrote: > Fair enough, although I used "fire" instead of "run" which is consistent with > the event trigger documentation. Okay by me. -- Michael signature.asc Description: PGP signature
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Sat, Sep 23, 2023 at 10:18 AM Hayato Kuroda (Fujitsu) wrote: > > Again, thank you for reviewing! Here is a new version patch. Here are some more comments/thoughts on the v44 patch: 1. +# pg_upgrade will fail because the slot still has unconsumed WAL records +command_fails( +[ Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n", file as well? 2. +'run of pg_upgrade where the new cluster has insufficient max_replication_slots'); +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", +"pg_upgrade_output.d/ not removed after pg_upgrade failure"); +'run of pg_upgrade where the new cluster has the wrong wal_level'); +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", +"pg_upgrade_output.d/ not removed after pg_upgrade failure"); +'run of pg_upgrade of old cluster with idle replication slots'); +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d", +"pg_upgrade_output.d/ not removed after pg_upgrade failure"); How do these tests recognize the failures are the intended ones? I mean, for instance when pg_upgrade fails for unused replication slots/unconsumed WAL records, then just looking at the presence of pg_upgrade_output.d might not be sufficient, no? Using command_fails_like instead of command_fails and looking at the contents of invalid_logical_relication_slots.txt might help make these tests more focused. 3. +pg_log(PG_REPORT, "fatal"); +pg_fatal("Your installation contains invalid logical replication slots.\n" + "These slots can't be copied, so this cluster cannot be upgraded.\n" + "Consider removing such slots or consuming the pending WAL if any,\n" + "and then restart the upgrade.\n" + "A list of invalid logical replication slots is in the file:\n" + "%s", output_path); It's not just the invalid logical replication slots, but also the slots with unconsumed WALs which aren't invalid and can be upgraded if ensured the WAL is consumed. So, a better wording would be: pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n" "List of all such logical replication slots is in the file:\n" "These slots can't be copied, so this cluster cannot be upgraded.\n" "Consider removing invalid slots and/or consuming the pending WAL if any,\n" "and then restart the upgrade.\n" "%s", output_path); 4. +/* + * There is a possibility that following records may be generated + * during the upgrade. + */ +is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) || +is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) || +is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) || +is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) || +is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) || +is_xlog_record_type(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) || +is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE); What if we missed to capture the WAL records that may be generated during upgrade? What happens if a custom WAL resource manager generates table/index AM WAL records during upgrade? What happens if new WAL records are added that may be generated during the upgrade? Isn't keeping this code extensible and in sync with future changes a problem? Or we'd better say that any custom WAL records are found after the slot's confirmed flush LSN, then the slot isn't upgraded? 5. In continuation to the above comment: Why can't this logic be something like - if there's any WAL record seen after a slot's confirmed flush LSN is of type generated by WAL resource manager having the rm_decode function defined, then the slot can't be upgraded. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Invalidate the subscription worker in cases where a user loses their superuser status
On Mon, 25 Sept 2023 at 00:32, vignesh C wrote: > > On Sat, 23 Sept 2023 at 11:28, Amit Kapila wrote: > > > > On Sat, Sep 23, 2023 at 1:27 AM vignesh C wrote: > > > > > > > > > Fixed this issue by checking if the subscription owner has changed > > > from superuser to non-superuser in case the pg_authid rows changes. > > > The attached patch has the changes for the same. > > > > > > > @@ -3952,7 +3953,9 @@ maybe_reread_subscription(void) > > newsub->passwordrequired != MySubscription->passwordrequired || > > strcmp(newsub->origin, MySubscription->origin) != 0 || > > newsub->owner != MySubscription->owner || > > - !equal(newsub->publications, MySubscription->publications)) > > + !equal(newsub->publications, MySubscription->publications) || > > + (!superuser_arg(MySubscription->owner) && > > + MySubscription->isownersuperuser)) > > { > > if (am_parallel_apply_worker()) > > ereport(LOG, > > @@ -4605,6 +4608,13 @@ InitializeLogRepWorker(void) > > proc_exit(0); > > } > > > > + /* > > + * Fetch subscription owner is a superuser. This value will be later > > + * checked to see when there is any change with this role and the worker > > + * will be restarted if required. > > + */ > > + MySubscription->isownersuperuser = superuser_arg(MySubscription->owner); > > > > Why didn't you filled this parameter in GetSubscription() like other > > parameters? If we do that then the comparison of first change in your > > patch will look similar to all other comparisons. > > I felt this variable need not be added to the pg_subscription catalog > table, instead we could save the state of subscription owner when the > worker is started and compare this value during invalidations. As this > information is added only to the memory Subscription structure and not > added to the catalog FormData_pg_subscription, the checking is > slightly different in this case. Also since this variable will be used > only within the worker, I felt we need not add it to the catalog. On further thinking I felt getting superuser can be moved to GetSubscription which will make the code consistent with other checking and will fix the comment Amit had given, the attached version has the change for the same. Regards, Vignesh From f5731c4bde69e4c7f1a8c10f795d49956d335694 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Fri, 22 Sep 2023 15:12:23 +0530 Subject: [PATCH v2] Restart the apply worker if the subscription owner has changed from superuser to non-superuser. Restart the apply worker if the subscription owner has changed from superuser to non-superuser. This is required so that the subscription connection string gets revalidated to identify cases where the password option is not specified as part of the connection string for non-superuser. --- src/backend/catalog/pg_subscription.c | 3 +++ src/backend/replication/logical/worker.c | 15 --- src/include/catalog/pg_subscription.h | 1 + src/test/subscription/t/027_nosuperuser.pl | 21 + 4 files changed, 37 insertions(+), 3 deletions(-) diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index d07f88ce28..bc74b695c6 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -108,6 +108,9 @@ GetSubscription(Oid subid, bool missing_ok) Anum_pg_subscription_suborigin); sub->origin = TextDatumGetCString(datum); + /* Get superuser for subscription owner */ + sub->isownersuperuser = superuser_arg(sub->owner); + ReleaseSysCache(tup); return sub; diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 597947410f..8dd41836b1 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -3942,7 +3942,8 @@ maybe_reread_subscription(void) * The launcher will start a new worker but note that the parallel apply * worker won't restart if the streaming option's value is changed from * 'parallel' to any other value or the server decides not to stream the - * in-progress transaction. + * in-progress transaction. Exit if the owner of the subscription has + * changed from superuser to a non-superuser. */ if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 || strcmp(newsub->name, MySubscription->name) != 0 || @@ -3952,7 +3953,8 @@ maybe_reread_subscription(void) newsub->passwordrequired != MySubscription->passwordrequired || strcmp(newsub->origin, MySubscription->origin) != 0 || newsub->owner != MySubscription->owner || - !equal(newsub->publications, MySubscription->publications)) + !equal(newsub->publications, MySubscription->publications) || + (!newsub->isownersuperuser && MySubscription->isownersuperuser)) { if (am_parallel_apply_worker()) ereport(LOG, @@ -4621,11 +4623,18 @@ InitializeLogRepWorker(void) SetConfigOption("synchronous_commit", MySubscription->synccommit, PGC_BACKEND, PGC_S_OVERRIDE); - /* Keep us in
Re: Confused about gram.y referencs in Makefile?
> On 25 Sep 2023, at 05:34, Japin Li wrote: > Maybe be reference src/backend/parser/Makefile is better than current. > > How about "See gram.h target's comment in src/backend/parser/Makefile" > or just "See src/backend/parser/Makefile"? The latter seems more stable, if the Makefile is ever restructured it's almost guaranteed that this comment will be missed with the location info becoming stale. -- Daniel Gustafsson
Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector
> On 25 Sep 2023, at 00:57, Karl O. Pinc wrote: > I have made various, mostly unrelated to each other, > small improvements to the documentation. These > are usually in the areas of plpgsql, schemas, and permissions. > Most change 1 lines, but some supply short overviews. > > "Short" is subjective, so if these need to be > broken into different threads or different > commitfest entries let me know. While I agree it's subjective, I don't think adding a new section or paragraph qualifies as short or small. I would prefer if each (related) change is in a single commit with a commit message which describes the motivation for the change. A reviewer can second-guess the rationale for the changes, but they shouldn't have to. The resulting patchset can all be in the same thread though. -- Daniel Gustafsson
Re: RFC: Logging plan of the running query
On 2023-09-20 14:39, Lepikhov Andrei wrote: Thanks for your reply. On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote: On 2023-09-15 15:21, Lepikhov Andrei wrote: On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: I have explored this patch and, by and large, agree with the code. But some questions I want to discuss: 1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do their job? Do you imagine adding a hook which enables adding custom interrupt codes like below? https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch No, I think around the hook, which would allow us to rewrite the pg_query_state extension without additional patches by using the functionality provided by your patch. I mean, an extension could provide console UI, not only server logging. And obtain from target backend so much information in the explain as the instrumentation level of the current query can give. It may make pg_query_state shorter and more stable. 2. In this implementation, ProcessInterrupts does a lot of work during the explain creation: memory allocations, pass through the plan, systables locks, syscache access, etc. I guess it can change the semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, which would call a stored procedure, which executes some DDL and acquires row exclusive lock at the time of interruption. So, in my mind, it is too unpredictable to make the explain in the place of interruption processing. Maybe it is better to think about some hook at the end of ExecProcNode, where a pending explain could be created? Yeah, I withdrew this patch once for that reason, but we are resuming development in response to the results of a discussion by James and others at this year's pgcon about the safety of this process in CFI: https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com I can't track the logic path of the decision provided at this conference. But my anxiety related to specific place, where ActiveQueryDesc points top-level query and interruption comes during DDL, wrapped up in stored procedure. For example: CREATE TABLE test(); CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); ... END; $$ LANGUAGE plpgsql VOLATILE; SELECT ddl(), ... FROM ...; Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation