Re: REVOKE FROM warning on grantor
Hi Tom, Thanks for your anwser. > It does not say that that set must be nonempty. Admittedly it's not > very clear from this one point. However, if you look around in the > standard it seems clear that they expect no-op revokes to be no-ops > not errors. Postgres actually identifies memberhips to revoke. The list is not empty. Event if revoker has USAGE privilege on parent role, the membership is protected by a new check on grantor of membership. This is a new semantic for me. I guess this may obfuscate other people too. I would compare denied revoking of role with revoking privilege on denied table: > REVOKE SELECT ON TABLE toto FROM PUBLIC ; ERROR: permission denied for table toto > Even taking the position that this is an unspecified point that we > could implement how we like, I don't think there's a sufficient > argument for changing behavior that's stood for a couple of decades. In Postgres 15, revoking a membership granted by another role is accepted. I suspect this is related to the new CREATEROLE behaviour implemented by Robert Haas (which is great job anyway). Attached is a script to reproduce. Here is the output on Postgres 15: SET DROP ROLE DROP ROLE DROP ROLE CREATE ROLE CREATE ROLE CREATE ROLE GRANT ROLE SET REVOKE ROLE DO Here is the output of the same script on Postgres 16: SET DROP ROLE DROP ROLE DROP ROLE CREATE ROLE CREATE ROLE CREATE ROLE GRANT ROLE SET psql:ldap2pg/my-revoke.sql:12: WARNING: role "r" has not been granted membership in role "g" by role "m" REVOKE ROLE psql:ldap2pg/my-revoke.sql:18: ERROR: REVOKE failed CONTEXTE : PL/pgSQL function inline_code_block line 4 at RAISE Can you confirm this ? Regards, Étienne my-revoke.sql Description: application/sql
Re: What about Perl autodie?
> On 18 Mar 2024, at 14:18, Dagfinn Ilmari Mannsåker wrote: > Daniel Gustafsson writes: >> It would have been nice to standardize on >> using one of "|| die" and "or die" consistently but that's clearly not for >> this >> body of work. > > "or die" is generally the preferred form, since || has higher precedence > than comma, so it's easy to make mistakes if you don't parenthesise the > function args, like: > > open my $fh, '>', $filname || die "can't open $filename: $!"; > > which will only fail if $filename is falsy (i.e. undef, "", or "0"). Thanks for the clarification! Looking over the || die() codepaths we have, and we'll add as part of this patchset, none are vulnerable to the above issue AFAICT. -- Daniel Gustafsson
Re: sslinfo extension - add notbefore and notafter timestamps
On Fri, Mar 8, 2024 at 4:16 PM Cary Huang wrote: > Yes, I noticed this in the SSL test too. I am also in GTM-8, so for me the > tests would fail too due to the time zone differences from GMT. It shall be > okay to specifically set the outputs of pg_stat_ssl, > ssl_client_get_notbefore, and ssl_client_get_notafte to be in GMT time zone. > The not before and not after time stamps in a client certificate are > generally expressed in GMT. Hi Cary, did you have any thoughts on the timestamptz notes from my last mail? > It might also be nice to rename > ASN1_TIME_to_timestamp(). > > Squinting further at the server backend implementation, should that > also be using TimestampTz throughout, instead of Timestamp? It all > goes through float8_timestamptz at the end, so I guess it shouldn't > have a material impact, but it's a bit confusing. Thanks, --Jacob
Re: Is there still password max length restrictions in PG?
> On 18 Mar 2024, at 14:29, Sean wrote: > Need some document to make a clarification or suggestion to the user? The suggestion is to not use password authentication but instead use SCRAM. -- Daniel Gustafsson
Re: Alternative SAOP support for GiST
Hi All, > On 11 Mar 2024, at 18:58, Michał Kłeczek wrote: > > Hi All, > > During my journey of designing Pg based solution at my work I was severely > hit by several shortcomings in GiST. > The most severe one is the lack of support for SAOP filters as it makes it > difficult to have partition pruning and index (only) scans working together. > > To overcome the difficulties I implemented a simple extension: > https://github.com/mkleczek/btree_gist_extra > > Since it provides a separate operator class from btree_gist it requires > re-indexing the data. > So I thought maybe it would be a good idea to incorporate it into btree_gist. > While working on supporting (sort of) SAOP support for Gist I was stuck with the interplay between Gist consistent function and partition pruning. Not sure how it applies to SAOP handling in general though. I’ve implemented an operator class/family that supports Gist index scan for the following query: SELECT * FROM tbl WHERE col ||= ARRAY[‘a’, ‘b’, ‘c’]; It all works well except cases where tbl is partitioned by “col” column. In this case index scan unnecessarily scans pages for values that are not in the partition. I am not sure if it works as expected (ie. no unnecessary scans) in case of ANY = (ARRAY[]) and nbtree. In case of Gist the only place where pre-processing of queries can be performed is consistent function. But right now there is no possibility to access any scan related information as it is not passed to consistent function. The only thing available is GISTENTRY and it does not contain any metadata. As a workaround I’ve added options to the op family that allows (redundantly) specifying MODULUS/REMAINDER for the index: CREATE INDEX idx ON tbl_partition_01 USING gist ( col gist_text_extra_ops (modulus=4, remainder=2) ) and use the values to filter out query array passed to consistent function. This is of course not ideal: - the information is redundant - changing partitioning scheme requires re-indexing I am wondering if it is possible to extend consistent function API so that either ScanKey or even the whole IndexScanDesc is passed as additional parameter. — Michal
Is there still password max length restrictions in PG?
Hi All, Just noticed that the definition: postgres=# \d pg_shadow . usebypassrls | boolean | | | passwd | text | C | | . Looks like there is no length restriction for the password of a user. And in the code change history, 67a472d71c ("Remove arbitrary restrictions on password length.", 2020-09-03) seems having removed the length restriction. (in the history, there is 100 or even max length of 1024.) So, here, just a minor question, can we consider there is no max length restriction for the password of a user??? Need some document to make a clarification or suggestion to the user? BR, Sean He (iih...@qq.com)
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
On Sat, 16 Mar 2024 at 01:12, Peter Geoghegan wrote: > > On Fri, Mar 8, 2024 at 9:00 AM Matthias van de Meent > wrote: > > I've attached v14, where 0001 is v13, 0002 is a patch with small > > changes + some large comment suggestions, and 0003 which contains > > sorted merge join code for _bt_merge_arrays. > > I have accepted your changes from 0003. Agree that it's better that > way. It's at least a little faster, but not meaningfully more > complicated. Thanks. > > I'll try to work a bit on v13/14's _bt_preprocess_keys, and see what I > > can make of it. > > That's been the big focus of this new v15, which now goes all out with > teaching _bt_preprocess_keys with how to deal with arrays. We now do > comprehensive normalization of even very complicated combinations of > arrays and non-array scan keys in this version. I was thinking about a more unified processing model, where _bt_preprocess_keys would iterate over all keys, including processing of array keys (by merging and reduction to normal keys) if and when found. This would also reduce the effort expended when there are contradictory scan keys, as array preprocessing is relatively more expensive than other scankeys and contradictions are then found before processing of later keys. As I wasn't very far into the work yet it seems I can reuse a lot of your work here. > For example, consider this query: [...] > This has a total of 6 input scankeys -- 3 of which are arrays. But by > the time v15's _bt_preprocess_keys is done with it, it'll have only 1 > scan key -- which doesn't even have an array (not anymore). And so we > won't even need to advance the array keys one single time -- there'll > simply be no array left to advance. In other words, it'll be just as > if the query was written this way from the start: > > select * > from multi_test > where > a = 3::int; > > (Though of course the original query will spend more cycles on > preprocessing, compared to this manually simplified variant.) That's a good improvement, much closer to an optimal access pattern. > It turned out to not be terribly difficult to teach > _bt_preprocess_keys everything it could possibly need to know about > arrays, so that it can operate on them directly, as a variant of the > standard equality strategy (we do still need _bt_preprocess_array_keys > for basic preprocessing of arrays, mostly just merging them). This is > better overall (in that it gets every little subtlety right), but it > also simplified a number of related issues. For example, there is no > longer any need to maintain a mapping between so->keyData[]-wise scan > keys (output scan keys), and scan->keyData[]-wise scan keys (input > scan keys). We can just add a step to fix-up the references to the end > of _bt_preprocess_keys, to make life easier within > _bt_advance_array_keys. > > This preprocessing work should all be happening during planning, not > during query execution -- that's the design that makes the most sense. > This is something we've discussed in the past in the context of skip > scan (see my original email to this thread for the reference). Yes, but IIRC we also agreed that it's impossible to do this fully in planning, amongst others due to joins on array fields. > It > would be especially useful for the very fancy kinds of preprocessing > that are described by the MDAM paper, like using an index scan for a > NOT IN() list/array (this can actually make sense with a low > cardinality index). Yes, indexes such as those on enums. Though, in those cases the NOT IN () could be transformed into IN()-lists by the planner, but not the index. > The structure for preprocessing that I'm working towards (especially > in v15) sets the groundwork for making those shifts in the planner, > because we'll no longer treat each array constant as its own primitive > index scan during preprocessing. I hope that's going to be a fully separate patch. I don't think I can handle much more complexity in this one. > Right now, on HEAD, preprocessing > with arrays kinda treats each array constant like the parameter of an > imaginary inner index scan, from an imaginary nested loop join. But > the planner really needs to operate on the whole qual, all at once, > including any arrays. An actual nestloop join's inner index scan > naturally cannot do that, and so might still require runtime/dynamic > preprocessing in a world where that mostly happens in the planner -- > but that clearly not appropriate for arrays ("WHERE a = 5" and "WHERE > a in(4, 5)" are almost the same thing, and so should be handled in > almost the same way by preprocessing). Yeah, if the planner could handle some of this that'd be great. At the same time, I think that this might need to be gated behind a guc for more expensive planner-time deductions. > > > + * We generally permit primitive index scans to continue onto the > > > next > > > + * sibling page when the page's finaltup satisfies all required scan > > > keys > > > + *
Re: What about Perl autodie?
Daniel Gustafsson writes: >> On 18 Mar 2024, at 07:27, Peter Eisentraut wrote: > >> After some pondering, I figured the exclude list is better. > > Agreed. > >> So here is a squashed patch, also with a complete commit message. > > Looks good from a read-through. It would have been nice to standardize on > using one of "|| die" and "or die" consistently but that's clearly not for > this > body of work. "or die" is generally the preferred form, since || has higher precedence than comma, so it's easy to make mistakes if you don't parenthesise the function args, like: open my $fh, '>', $filname || die "can't open $filename: $!"; which will only fail if $filename is falsy (i.e. undef, "", or "0"). - ilmari
Re: Support json_errdetail in FRONTEND builds
On Sun, Mar 17, 2024 at 4:49 PM Daniel Gustafsson wrote: > I took another look at this tonight and committed it with some mostly cosmetic > changes. Great! Thanks everyone. --Jacob
Re: Possibility to disable `ALTER SYSTEM`
> On 18 Mar 2024, at 13:57, Robert Haas wrote: > my proposal is something like this, taking a > bunch of text from Jelte's patch and some inspiration from Magnus's > earlier remarks: I still think any wording should clearly mention that settings in the file are still applied. The proposed wording says to implicitly but to avoid confusion I think it should be explicit. > Perhaps figuring out how to > document this is best left to a separate thread, and there's also the > question of whether a new section that talks about this also ought to > talk about anything else. But I feel like we're way overdue to do > something about this. Seconded, both that it needs to be addressed and that it should be done on a separate thread from this one. -- Daniel Gustafsson
Re: Improve readability by using designated initializers when possible
On Mon, Mar 18, 2024 at 6:01 PM jian he wrote: > > On Mon, Mar 18, 2024 at 3:09 PM Peter Eisentraut wrote: > > > > On 14.03.24 01:26, Michael Paquier wrote: > > > -EventTriggerSupportsObjectClass(ObjectClass objclass) > > > +EventTriggerSupportsObject(const ObjectAddress *object) > > > > > > The shortcut introduced here is interesting, but it is inconsistent. > > > HEAD treats OCLASS_SUBSCRIPTION as something supported by event > > > triggers, but as pg_subscription is a shared catalog it would be > > > discarded with your change. Subscriptions are marked as supported in > > > the event trigger table: > > > https://www.postgresql.org/docs/devel/event-trigger-matrix.html > > > > Ah, good catch. Subscriptions are a little special there. Here is a > > new patch that keeps the switch/case arrangement in that function. That > > also makes it easier to keep the two EventTriggerSupports... functions > > aligned. Also added a note about subscriptions and a reference to the > > documentation. > > select relname from pg_class where relisshared and relkind = 'r'; > relname > --- > pg_authid > pg_subscription > pg_database > pg_db_role_setting > pg_tablespace > pg_auth_members > pg_shdepend > pg_shdescription > pg_replication_origin > pg_shseclabel > pg_parameter_acl > (11 rows) > also in function doDeletion we have: /* * These global object types are not supported here. */ case AuthIdRelationId: case DatabaseRelationId: case TableSpaceRelationId: case SubscriptionRelationId: case ParameterAclRelationId: elog(ERROR, "global objects cannot be deleted by doDeletion"); break; do we need to add other global objects? in the end, it does not matter since we have: default: elog(ERROR, "unsupported object class: %u", object->classId);
Re: Possibility to disable `ALTER SYSTEM`
On Fri, Mar 15, 2024 at 7:09 AM Jelte Fennema-Nio wrote: > On Fri, 15 Mar 2024 at 11:08, Daniel Gustafsson wrote: > > Another quirk for the documentation of this: if I disable ALTER SYSTEM I > > would > > assume that postgresql.auto.conf is no longer consumed, but it still is (and > > still need to be), so maybe "enable/disable" is the wrong choice of words? > > Updated the docs to reflect this quirk. But I kept the same name for > the GUC for now, because I couldn't come up with a better name myself. > If someone suggests a better name, I'm happy to change it though. Hmm. So in this patch, we have a whole new kind of GUC - guard rails - of which enable_alter_system is the first member. Is that what we want? I would have been somewhat inclined to find an existing section of postgresql.auto.conf for this parameter, perhaps "platform and version compatibility". But if we're going to add a bunch of similar GUCs, maybe grouping them all together is the way to go. Even if that is what we're going to do, do we want to call them "guard rails"? I'm not sure I'd find that name terribly clear, as a user. We know what we mean right now because we're having a very active discussion about this topic, but it might not seem as clear to someone coming at it fresh. On balance, I'm disinclined to add a new category for this. If we get to a point where we have several of these and we want to break them out into a new category, we can do it then. Maybe by that time the naming will seem more clear, too. I also don't think it's good enough to just say that this isn't a security feature. Talk is cheap. I think we need to say why it's not a security feature. So my proposal is something like this, taking a bunch of text from Jelte's patch and some inspiration from Magnus's earlier remarks: == When enable_alter_system is set to off, an error is returned if the ALTER SYSTEM command is used. This parameter can only be set in the postgresql.conf file or on the server command line. The default value is on. Note that this setting cannot be regarded as a security feature. It only disables the ALTER SYSTEM command. It does not prevent a superuser from changing the configuration remotely using other means. A superuser has many ways of executing shell commands at the operating system level, and can therefore modify postgresql.auto.conf regardless of the value of this setting. The purpose of the setting is to prevent accidental modifications via ALTER SYSTEM in environments where PostgreSQL's configuration is managed by some outside mechanism. In such environments, using ALTER SYSTEM to make configuration changes might appear to work, but then may be discarded at some point in the future when that outside mechanism updates the configuration. Setting this parameter to false can help to avoid such mistakes. == I agree with Daniel's comment that Tom's concerns about people filing CVEs are not without merit; indeed, I said the same thing in my first post to this thread. However, I also believe that's not a sufficient reason for rejecting a feature that many people seem to want. I think the root of this problem is that our documentation is totally unclear about the fact that we don't intend for there to be privilege separation between the operating system user and the PostgreSQL superuser; people want there to be a distinction, and think there is. Hence CVE-2019-9193, for example. Several people, including me, wrote blog posts about how that's not a security vulnerability, but while I was researching mine, I went looking for where in the documentation we actually SAY that there's no privilege separation between the OS user and the superuser. The only mention I found at the time was the PL/perlu documentation, which said this: "The writer of a PL/PerlU function must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator." That statement, from the official documentation, in my mind at least, DOES confirm that we don't intend privilege separation, but it's really oblique. You have to think through the fact that the superuser has to be the one to install plperlu, and that plperlu functions can usurp the OS user; since both of those things are documented to be the case, it follows that we know and expect that the superuser can usurp the OS user. But someone who is wondering how PostgreSQL's security model works is not going to read the plperlu documentation and make the inferences I just described. It's crazy to me that a principle frequently cited as gospel on this mailing list and others is nearly undocumented. Obviously, even if we did document it clearly, people could still get confused (or just disagree with our position) and file CVEs anyway, but we're not helping our case by having nothing to cite. A difficulty is where to PUT such a mention in the documentation. There's not a single section title in the top-level
Re: Building with meson on NixOS/nixpkgs
Hi, Thank you for the patches! On Sat, 16 Mar 2024 at 14:48, Wolfgang Walther wrote: > > To build on NixOS/nixpkgs I came up with a few small patches to > meson.build. All of this works fine with Autoconf/Make already. I do not have NixOS but I confirm that patches cleanly apply to master and do pass CI. I have a small feedback: 0001 & 0002: Adding code comments to explain why they have fallback could be nice. 0003: Looks good to me. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Reducing connection overhead in pg_upgrade compat check phase
Attached is a fresh rebase with only minor cosmetic touch-ups which I would like to go ahead with during this CF. Peter: does this address the comments you had on translation and code duplication? -- Daniel Gustafsson v15-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch Description: Binary data
Re: BitmapHeapScan streaming read user and prelim refactoring
On 14/02/2024 21:42, Andres Freund wrote: On 2024-02-13 18:11:25 -0500, Melanie Plageman wrote: patch 0004 is, I think, a bug fix. see [2]. I'd not quite call it a bugfix, it's not like it leads to wrong behaviour. Seems more like an optimization. But whatever :) It sure looks like bug to me, albeit a very minor one. Certainly not an optimization, it doesn't affect performance in any way, only what EXPLAIN reports. So committed and backported that to all supported branches. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Mon, Mar 18, 2024 at 8:57 PM Ashutosh Bapat wrote: > On Mon, Mar 18, 2024 at 5:05 PM Amit Langote wrote: >> On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat >> wrote: >>> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote >>> wrote: Could you please post the numbers with the palloc() / pfree() version? >>> >>> Here are they >>> tables | master | patched >>> +-+- >>> 2 | 29 MB | 28 MB >>> 3 | 102 MB | 93 MB >>> 4 | 307 MB | 263 MB >>> 5 | 1076 MB | 843 MB >>> >>> The numbers look slightly different from my earlier numbers. But they were >>> quite old. The patch used to measure memory that time is different from the >>> one that we committed. So there's a slight difference in the way we measure >>> memory as well. >> >> >> Sorry, I should’ve mentioned that I was interested in seeing cpu times to >> compare the two approaches. Specifically, to see if the palloc / frees add >> noticeable overhead. > > No problem. Here you go > > tables | master | patched | perc_change > +--+--+- > 2 | 477.87 | 492.32 | -3.02 > 3 | 1970.83 | 1989.52 | -0.95 > 4 | 6305.01 | 6268.81 |0.57 > 5 | 19261.56 | 18684.86 |2.99 > > +ve change indicates reduced planning time. It seems that the planning time > improves as the number of tables increases. But all the numbers are well > within noise levels and thus may not show any improvement or deterioration. > If you want, I can repeat the experiment. Thanks. I also wanted to see cpu time difference between the two approaches of SpecialJoinInfo allocation -- on stack and on heap. So comparing times with the previous version of the patch using stack allocation and the new one with palloc. I'm hoping that the new approach is only worse in the noise range. -- Thanks, Amit Langote
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Mon, Mar 18, 2024 at 5:05 PM Amit Langote wrote: > On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat > wrote: > >> Hi Amit, >> >> >> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote >> wrote: >> >>> > > >>> > > That being said I'm a big fan of using a local variable on stack and >>> > > filling it. I'd probably go with the usual palloc/pfree, because that >>> > > makes it much easier to use - the callers would not be responsible >>> for >>> > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of >>> > > overhead, but with the AllocSet caching I doubt it's measurable. >>> > >>> > You are suggesting that instead of declaring a local variable of type >>> > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return >>> > SpecialJoinInfo which will be freed by free_child_sjinfo() >>> > (free_child_sjinfo_members in the patch). I am fine with that. >>> >>> Agree with Tomas about using makeNode() / pfree(). Having the pfree() >>> kind of makes it extra-evident that those SpecialJoinInfos are >>> transitory. >>> >> >> Attached patch-set >> >> 0001 - original patch as is >> 0002 - addresses your first set of comments >> 0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo >> structure. >> >> I will squash both 0002 and 0003 into 0001 once you review those changes >> and are fine with those. >> > > Thanks for the new patches. > > > > I did put this through check-world on amd64/arm64, with valgrind, >>> > > without any issue. I also tried the scripts shared by Ashutosh in his >>> > > initial message (with some minor fixes, adding MEMORY to explain >>> etc). >>> > > >>> > > The results with the 20240130 patches are like this: >>> > > >>> > >tablesmasterpatched >>> > > - >>> > > 2 40.8 39.9 >>> > > 3 151.7 142.6 >>> > > 4 464.0 418.5 >>> > > 51663.9 1419.5 >>> >>> Could you please post the numbers with the palloc() / pfree() version? >>> >>> >> Here are they >> tables | master | patched >> +-+- >> 2 | 29 MB | 28 MB >> 3 | 102 MB | 93 MB >> 4 | 307 MB | 263 MB >> 5 | 1076 MB | 843 MB >> >> The numbers look slightly different from my earlier numbers. But they >> were quite old. The patch used to measure memory that time is different >> from the one that we committed. So there's a slight difference in the way >> we measure memory as well. >> > > Sorry, I should’ve mentioned that I was interested in seeing cpu times to > compare the two approaches. Specifically, to see if the palloc / frees add > noticeable overhead. > No problem. Here you go tables | master | patched | perc_change +--+--+- 2 | 477.87 | 492.32 | -3.02 3 | 1970.83 | 1989.52 | -0.95 4 | 6305.01 | 6268.81 |0.57 5 | 19261.56 | 18684.86 |2.99 +ve change indicates reduced planning time. It seems that the planning time improves as the number of tables increases. But all the numbers are well within noise levels and thus may not show any improvement or deterioration. If you want, I can repeat the experiment. -- Best Wishes, Ashutosh Bapat
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat wrote: > Hi Amit, > > > On Fri, Mar 15, 2024 at 11:45 AM Amit Langote > wrote: > >> > > >> > > That being said I'm a big fan of using a local variable on stack and >> > > filling it. I'd probably go with the usual palloc/pfree, because that >> > > makes it much easier to use - the callers would not be responsible for >> > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of >> > > overhead, but with the AllocSet caching I doubt it's measurable. >> > >> > You are suggesting that instead of declaring a local variable of type >> > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return >> > SpecialJoinInfo which will be freed by free_child_sjinfo() >> > (free_child_sjinfo_members in the patch). I am fine with that. >> >> Agree with Tomas about using makeNode() / pfree(). Having the pfree() >> kind of makes it extra-evident that those SpecialJoinInfos are >> transitory. >> > > Attached patch-set > > 0001 - original patch as is > 0002 - addresses your first set of comments > 0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo > structure. > > I will squash both 0002 and 0003 into 0001 once you review those changes > and are fine with those. > Thanks for the new patches. > > I did put this through check-world on amd64/arm64, with valgrind, >> > > without any issue. I also tried the scripts shared by Ashutosh in his >> > > initial message (with some minor fixes, adding MEMORY to explain etc). >> > > >> > > The results with the 20240130 patches are like this: >> > > >> > >tablesmasterpatched >> > > - >> > > 2 40.8 39.9 >> > > 3 151.7 142.6 >> > > 4 464.0 418.5 >> > > 51663.9 1419.5 >> >> Could you please post the numbers with the palloc() / pfree() version? >> >> > Here are they > tables | master | patched > +-+- > 2 | 29 MB | 28 MB > 3 | 102 MB | 93 MB > 4 | 307 MB | 263 MB > 5 | 1076 MB | 843 MB > > The numbers look slightly different from my earlier numbers. But they were > quite old. The patch used to measure memory that time is different from the > one that we committed. So there's a slight difference in the way we measure > memory as well. > Sorry, I should’ve mentioned that I was interested in seeing cpu times to compare the two approaches. Specifically, to see if the palloc / frees add noticeable overhead.
Re: BitmapHeapScan streaming read user and prelim refactoring
On 3/17/24 20:36, Tomas Vondra wrote: > > ... > >> Besides a lot of other things, I finally added debugging fprintfs printing >> the >> pid, (prefetch, read), block number. Even looking at tiny excerpts of the >> large amount of output that generates shows that two iterators were out of >> sync. >> > > Thanks. I did experiment with fprintf, but it's quite cumbersome, so I > was hoping you came up with some smart way to trace this king of stuff. > For example I was wondering if ebpf would be a more convenient way. > FWIW I just realized why I failed to identify this "late prefetch" issue during my investigation. I was experimenting with instrumenting this by adding a LD_PRELOAD library, logging all pread/fadvise calls. But the FilePrefetch call is skipped in the page is already in shared buffers, so this case "disappeared" during processing which matched the two calls by doing an "inner join". That being said, I think tracing this using LD_PRELOAD or perf may be more convenient way to see what's happening. For example I ended up doing this: perf record -a -e syscalls:sys_enter_fadvise64 \ -e syscalls:sys_exit_fadvise64 \ -e syscalls:sys_enter_pread64 \ -e syscalls:sys_exit_pread64 perf script -ns Alternatively, perf-trace can be used and prints the filename too (but time has ms resolution only). Processing this seems comparable to the fprintf approach. It still has the issue that some of the fadvise calls may be absent if the prefetch iterator gets too far behind, but I think that can be detected / measured by simply counting the fadvise calls, and comparing them to pread calls. We expect these to be about the same, so (#pread - #fadvise) / #fadvise is a measure of how many were "late" and skipped. It also seems better than fprintf because it traces the actual syscalls, not just calls to glibc wrappers. For example I saw this postgres 54769 [001] 33768.771524828: syscalls:sys_enter_pread64: ..., pos: 0x30d04000 postgres 54769 [001] 33768.771526867: syscalls:sys_exit_pread64: 0x2000 postgres 54820 [000] 33768.771527473: syscalls:sys_enter_fadvise64: ..., offset: 0x30d04000, ... postgres 54820 [000] 33768.771528320: syscalls:sys_exit_fadvise64: 0x0 which is clearly a case where we issue fadvise after pread of the same block already completed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add pg_basetype() function to obtain a DOMAIN base type
On Mon, Mar 18, 2024 at 2:01 AM jian he wrote: > > looking at it again. > I found out we can just simply do > ` > Datum > pg_basetype(PG_FUNCTION_ARGS) > { > Oid oid; > > oid = get_fn_expr_argtype(fcinfo->flinfo, 0); > PG_RETURN_OID(getBaseType(oid)); > } > ` > > if the type is not a domain, work the same as pg_typeof. > if the type is domain, pg_typeof return as is, pg_basetype return the > base type. > so it only diverges when the argument type is a type of domain. > > the doc: > pg_basetype ( "any" ) > regtype > > >Returns the OID of the base type of a domain. If the argument > is not a type of domain, >return the OID of the data type of the argument just like linkend="function-pg-typeof">pg_typeof(). >If there's a chain of domain dependencies, it will recurse > until finding the base type. > > > > also, I think this way, we only do one syscache lookup. Looks good to me. But should it be named pg_basetypeof()? -- Regards, Alexander Korotkov
Re: speed up a logical replica setup
On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote: 02. "The main difference between the logical replication setup and pg_createsubscriber is the initial data copy." Grammarly suggests: "The initial data copy is the main difference between the logical replication setup and pg_createsubscriber." I think that change is worse. 09. "The source server must accept connections from the target server. The source server must not be in recovery." Grammarly suggests: "The source server must accept connections from the target server and not be in recovery." I think the previous version is better. 17. "specifies promote" We can do double-quote for the word promote. The v30 patch has promote, which I think is adequate. 19. "is accepting read-write transactions" Grammarly suggests: "accepts read-write transactions" I like the first one better. 20. New options must be also documented as well. This helps not only users but also reviewers. (Sometimes we cannot identify that the implementation is intentinal or not.) Which ones are missing? 21. Also, not sure the specification is good. I preferred to specify them by format string. Because it can reduce the number of arguments and I cannot find use cases which user want to control the name of objects. However, your approach has a benefit which users can easily identify the generated objects by pg_createsubscriber. How do other think? I think listing them explicitly is better for the first version. It's simpler to implement and more flexible. 22. ``` #define BASE_OUTPUT_DIR "pg_createsubscriber_output.d" ``` No one refers the define. This is gone in v30. 23. ``` } CreateSubscriberOptions; ... } LogicalRepInfo; ``` Declarations after the "{" are not needed, because we do not do typedef. Yeah, this is actually wrong, because as it is written now, it defines global variables. 22. While seeing definitions of functions, I found that some pointers are declared as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be changed but not the constant. Is it just missing or is there another rule? Yes, more could be done here. I have attached a patch for this. (This also requires the just-committed 48018f1d8c.) 24. ``` /* standby / subscriber data directory */ static char *subscriber_dir = NULL; ``` It is bit strange that only subscriber_dir is a global variable. Caller requires the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and main. So, how about makeing `CreateSubscriberOptions opt` to global one? Fewer global variables seem preferable. Only make global as needed. 30. ``` if (num_replslots == 0) dbinfo[i].replslotname = pg_strdup(genname); ``` I think the straightforward way is to use the name of subscription if no name is specified. This follows the rule for CREATE SUBSCRIPTION. right 31. ``` /* Create replication slot on publisher */ if (lsn) pg_free(lsn); ``` I think allocating/freeing memory is not so efficient. Can we add a flag to create_logical_replication_slot() for controlling the returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1) as flag. Nothing is even using the return value of create_logical_replication_slot(). I think this can be removed altogether. 34. ``` {"config-file", required_argument, NULL, 1}, {"publication", required_argument, NULL, 2}, {"replication-slot", required_argument, NULL, 3}, {"subscription", required_argument, NULL, 4}, ``` The ordering looks strange for me. According to pg_upgarade and pg_basebackup, options which do not have short notation are listed behind. 35. ``` opt.sub_port = palloc(16); ``` Per other lines, pg_alloc() should be used. Even better psprintf(). 37. ``` /* Register a function to clean up objects in case of failure */ atexit(cleanup_objects_atexit); ``` Sorry if we have already discussed. I think the registration can be moved just before the boot of the standby. Before that, the callback will be no-op. But it can also stay where it is. What is the advantage of moving it later? 40. ``` * XXX this code was extracted from BootStrapXLOG(). ``` So, can we extract the common part to somewhere? Since system identifier is related with the controldata file, I think it can be located in controldata_util.c. Let's leave it as is for this PG release. From d951a9f186b2162aa241f7554908b236c718154f Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 18 Mar 2024 11:54:03 +0100 Subject: [PATCH] fixup! pg_createsubscriber: creates a new logical replica from a standby server Add const decorations. --- src/bin/pg_basebackup/pg_createsubscriber.c | 54 ++--- 1 file changed, 27 insertions(+), 27
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Hi Amit, On Fri, Mar 15, 2024 at 11:45 AM Amit Langote wrote: > > > > > > That being said I'm a big fan of using a local variable on stack and > > > filling it. I'd probably go with the usual palloc/pfree, because that > > > makes it much easier to use - the callers would not be responsible for > > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of > > > overhead, but with the AllocSet caching I doubt it's measurable. > > > > You are suggesting that instead of declaring a local variable of type > > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return > > SpecialJoinInfo which will be freed by free_child_sjinfo() > > (free_child_sjinfo_members in the patch). I am fine with that. > > Agree with Tomas about using makeNode() / pfree(). Having the pfree() > kind of makes it extra-evident that those SpecialJoinInfos are > transitory. > Attached patch-set 0001 - original patch as is 0002 - addresses your first set of comments 0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo structure. I will squash both 0002 and 0003 into 0001 once you review those changes and are fine with those. > > > > I did put this through check-world on amd64/arm64, with valgrind, > > > without any issue. I also tried the scripts shared by Ashutosh in his > > > initial message (with some minor fixes, adding MEMORY to explain etc). > > > > > > The results with the 20240130 patches are like this: > > > > > >tablesmasterpatched > > > - > > > 2 40.8 39.9 > > > 3 151.7 142.6 > > > 4 464.0 418.5 > > > 51663.9 1419.5 > > Could you please post the numbers with the palloc() / pfree() version? > > Here are they tables | master | patched +-+- 2 | 29 MB | 28 MB 3 | 102 MB | 93 MB 4 | 307 MB | 263 MB 5 | 1076 MB | 843 MB The numbers look slightly different from my earlier numbers. But they were quite old. The patch used to measure memory that time is different from the one that we committed. So there's a slight difference in the way we measure memory as well. -- Best Wishes, Ashutosh Bapat From fc14dd22b1a150e82f99cad4440485daaf65eb00 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Wed, 27 Sep 2023 16:29:11 +0530 Subject: [PATCH 2/3] Address Amit's comments --- src/backend/optimizer/path/joinrels.c | 13 + src/include/nodes/pathnodes.h | 4 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index d972aef988..05b3a6c017 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo, /* * free_child_sjinfo_members * Free memory consumed by members of a child SpecialJoinInfo. + * + * Only members that are translated copies of their counterpart in the parent + * SpecialJoinInfo are freed here. However, members that could be referenced + * elsewhere are not freed. */ static void free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) @@ -1746,19 +1750,12 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) if (child_sjinfo->jointype == JOIN_INNER) return; - /* - * The relids are used only for comparison and their references are not - * stored anywhere. Free those. - */ bms_free(child_sjinfo->min_lefthand); bms_free(child_sjinfo->min_righthand); bms_free(child_sjinfo->syn_lefthand); bms_free(child_sjinfo->syn_righthand); - /* - * But the list of operator OIDs and the list of expressions may be - * referenced somewhere else. Do not free those. - */ + /* semi_rhs_exprs may be referenced, so don't free. */ } /* diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 534692bee1..7192ae5171 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -2854,6 +2854,10 @@ typedef struct PlaceHolderVar * cost estimation purposes it is sometimes useful to know the join size under * plain innerjoin semantics. Note that lhs_strict and the semi_xxx fields * are not set meaningfully within such structs. + * + * We also create transient SpecialJoinInfos for child joins by translating + * corresponding parent SpecialJoinInfos. These are also not part of + * PlannerInfo's join_info_list. */ #ifndef HAVE_SPECIALJOININFO_TYPEDEF typedef struct SpecialJoinInfo SpecialJoinInfo; -- 2.25.1 From 5a12690afe180defb8cc4f923fb0c4b89c2153a1 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Mon, 24 Jul 2023 20:20:53 +0530 Subject: [PATCH 1/3] Reduce memory used by child SpecialJoinInfo The SpecialJoinInfo applicable to a child join is computed by translating the same applicable to the parent join in try_partitionwise_join(). The child SpecialJoinInfo is not needed once the child join RelOptInfo is
Re: speed up a logical replica setup
On 18.03.24 08:18, vignesh C wrote: 1) Maximum size of the object name is 64, we can have a check so that we don't specify more than the maximum allowed length: + case 3: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_replslots++; + } + else + { + pg_log_error("duplicate replication slot \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3" In this case creation of replication slot will fail: pg_createsubscriber: error: could not create replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on database "db2": ERROR: replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" already exists I think this is fine. The server can check whether the names it is given are of the right size. We don't need to check it again in the client. 2) Similarly here too: + case 4: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_subs++; + } + else + { + pg_log_error("duplicate subscription \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3 Subscriptions will be created with the same name and later there will be a problem when setting replication progress as there will be multiple subscriptions with the same name. Could you clarify this problem?
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote: > Please see the attached v11 patch set with all the above review > comments addressed. Thanks! Looking at 0001: 1 === + True if this logical slot conflicted with recovery (and so is now + invalidated). When this column is true, check Worth to add back the physical slot mention "Always NULL for physical slots."? 2 === @@ -1023,9 +1023,10 @@ CREATE VIEW pg_replication_slots AS L.wal_status, L.safe_wal_size, L.two_phase, -L.conflict_reason, +L.conflicting, L.failover, -L.synced +L.synced, +L.invalidation_reason What about making invalidation_reason close to conflict_reason? 3 === - * Maps a conflict reason for a replication slot to + * Maps a invalidation reason for a replication slot to s/a invalidation/an invalidation/? 4 === While at it, shouldn't we also rename "conflict" to say "invalidation_cause" in InvalidatePossiblyObsoleteSlot()? 5 === + * rows_removed and wal_level_insufficient are only two reasons s/are only two/are the only two/? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
Himanshu, On Mon, Mar 18, 2024 at 4:57 PM Himanshu Upadhyaya wrote: > I have tested a nested case but why is the negative number allowed in > subscript(NESTED '$.phones[-1]'COLUMNS), it should error out if the number is > negative. > > ‘postgres[170683]=#’SELECT * FROM JSON_TABLE(jsonb '{ > ‘...>’ "id" : "0.234567897890", > ‘...>’ "name" : { > "first":"John", "last":"Doe" > }, > ‘...>’ "phones" : [{"type":"home", "number":"555-3762"}, > ‘...>’ {"type":"work", "number":"555-7252", > "test":123}]}', > ‘...>’'$' > ‘...>’COLUMNS( > ‘...>’ id numeric(2,2) PATH 'lax $.id', > ‘...>’ last_name varCHAR(10) PATH 'lax $.name.last', > first_name VARCHAR(10) PATH 'lax $.name.first', > ‘...>’ NESTED '$.phones[-1]'COLUMNS ( > ‘...>’"type" VARCHAR(10), > ‘...>’"number" VARCHAR(10) > ‘...>’ ) > ‘...>’ ) > ‘...>’ ) as t; > id | last_name | first_name | type | number > --+---++--+ > 0.23 | Doe | Johnnn | | > (1 row) You're not getting an error because the default mode of handling structural errors in SQL/JSON path expressions is "lax". If you say "strict" in the path string, you will get an error: SELECT * FROM JSON_TABLE(jsonb '{ "id" : "0.234567897890", "name" : { "first":"John", "last":"Doe" }, "phones" : [{"type":"home", "number":"555-3762"}, {"type":"work", "number":"555-7252", "test":123}]}', '$' COLUMNS( id numeric(2,2) PATH 'lax $.id', last_name varCHAR(10) PATH 'lax $.name.last', first_name VARCHAR(10) PATH 'lax $.name.first', NESTED 'strict $.phones[-1]'COLUMNS ( "type" VARCHAR(10), "number" VARCHAR(10) ) ) error on error ) as t; ERROR: jsonpath array subscript is out of bounds -- Thanks, Amit Langote
Re: Catalog domain not-null constraints
Hi, > Anyway, in order to move this forward, here is an updated patch where > the ADD CONSTRAINT ... NOT NULL behavior for domains matches the > idempotent behavior of tables. This uses the patch that Jian He posted. I tested the patch on Raspberry Pi 5 and Intel MacBook and also experimented with it. Everything seems to work properly. Personally I believe new functions such as validateDomainNotNullConstraint() and findDomainNotNullConstraint() could use a few lines of comments (accepts..., returns..., etc). Also I think that the commit message should explicitly say that supporting NOT VALID constraints is out of scope of this patch. Except for named nitpicks v4 LGTM. -- Best regards, Aleksander Alekseev
Re: Improve readability by using designated initializers when possible
On Mon, Mar 18, 2024 at 3:09 PM Peter Eisentraut wrote: > > On 14.03.24 01:26, Michael Paquier wrote: > > -EventTriggerSupportsObjectClass(ObjectClass objclass) > > +EventTriggerSupportsObject(const ObjectAddress *object) > > > > The shortcut introduced here is interesting, but it is inconsistent. > > HEAD treats OCLASS_SUBSCRIPTION as something supported by event > > triggers, but as pg_subscription is a shared catalog it would be > > discarded with your change. Subscriptions are marked as supported in > > the event trigger table: > > https://www.postgresql.org/docs/devel/event-trigger-matrix.html > > Ah, good catch. Subscriptions are a little special there. Here is a > new patch that keeps the switch/case arrangement in that function. That > also makes it easier to keep the two EventTriggerSupports... functions > aligned. Also added a note about subscriptions and a reference to the > documentation. select relname from pg_class where relisshared and relkind = 'r'; relname --- pg_authid pg_subscription pg_database pg_db_role_setting pg_tablespace pg_auth_members pg_shdepend pg_shdescription pg_replication_origin pg_shseclabel pg_parameter_acl (11 rows) EventTriggerSupportsObject should return false for the following: SharedSecLabelRelationId SharedDescriptionRelationId DbRoleSettingRelationId SharedDependRelationId but I am not sure ReplicationOriginRelationId.
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila wrote: > On Sun, Mar 17, 2024 at 7:40 PM Alexander Korotkov wrote: > > On Sat, Mar 16, 2024 at 5:05 PM Bharath Rupireddy > > wrote: > > > On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila wrote: > > > > In general, it seems this patch has been stuck for a long time on the > > > > decision to choose an appropriate UI (syntax), and we thought of > > > > moving it further so that the other parts of the patch can be > > > > reviewed/discussed. So, I feel before pushing this we should see > > > > comments from a few (at least two) other senior members who earlier > > > > shared their opinion on the syntax. I know we don't have much time > > > > left but OTOH pushing such a change (where we didn't have a consensus > > > > on syntax) without much discussion at this point of time could lead to > > > > discussions after commit. > > > > > > +1 to gain consensus first on the syntax changes. With this, we might > > > be violating the SQL standard for explicit txn commands (I stand for > > > correction about the SQL standard though). > > > > Thank you for your feedback. Generally, I agree it's correct to get > > consensus on syntax first. And yes, this patch has been here since > > 2016. We didn't get consensus for syntax for 8 years. Frankly > > speaking, I don't see a reason why this shouldn't take another 8 > > years. At the same time the ability to wait on standby given LSN is > > replayed seems like pretty basic and simple functionality. Thus, it's > > quite frustrating it already took that long and still unclear when/how > > this could be finished. > > > > My current attempt was to commit minimal implementation as less > > invasive as possible. A new clause for BEGIN doesn't require > > additional keywords and doesn't introduce additional statements. But > > yes, this is still a new qual. And, yes, Amit you're right that even > > if I had committed that, there was still a high risk of further > > debates and revert. > > > > Given my specsis about agreement over syntax, I'd like to check > > another time if we could go without new syntax at all. There was an > > attempt to implement waiting for lsn as a function. But function > > holds a snapshot, which could prevent WAL records from being replayed. > > Releasing a snapshot could break the parent query. But now we have > > procedures, which need a dedicated statement for the call and can even > > control transactions. Could we implement a waitlsn in a procedure > > that: > > > > 1. First, check that it was called with non-atomic context (that is, > > it's not called within a transaction). Trigger error if called with > > atomic context. > > 2. Release a snapshot to be able to wait without risk of WAL replay > > stuck. Procedure is still called within the snapshot. It's a bit of > > a hack to release a snapshot, but Vacuum statements already do so. > > > > Can you please provide a bit more details with some example what is > the existing problem with functions and how using procedures will > resolve it? How will this this address the implicit transaction case > or do we have any other workaround for those cases? Please check [1] and [2] for the explanation of the problem with functions. Also, please find a draft patch implementing the procedure. The issue with the snapshot is addressed with the following lines. We first ensure we're in a non-atomic context, then pop an active snapshot (tricky, but ExecuteVacuum() does the same). Then we should have no active snapshot and it's safe to wait for lsn replay. if (context->atomic) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("pg_wait_lsn() must be only called in non-atomic context"))); if (ActiveSnapshotSet()) PopActiveSnapshot(); Assert(!ActiveSnapshotSet()); The function call could be added either before the BEGIN statement or before the implicit transaction. CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN; CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...; Links 1. https://www.postgresql.org/message-id/CAPpHfduBSN8j5j5Ynn5x%3DThD%3D8ypNd53D608VXGweBsPzxPvqA%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAPpHfdtiGgn0iS1KbW2HTam-1%2BoK%2BvhXZDAcnX9hKaA7Oe%3DF-A%40mail.gmail.com -- Regards, Alexander Korotkov v11-0001-Implement-AFTER-clause-for-BEGIN-command.patch Description: Binary data
Re: Refactoring backend fork+exec code
On 13/03/2024 09:30, Heikki Linnakangas wrote: Attached is a new version of the remaining patches. Committed, with some final cosmetic cleanups. Thanks everyone! -- Heikki Linnakangas Neon (https://neon.tech)
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Mon, Mar 18, 2024 at 08:50:56AM +0530, Amit Kapila wrote: > On Sun, Mar 17, 2024 at 2:03 PM Bharath Rupireddy > wrote: > > > > On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila wrote: > > > > > > procArray->replication_slot_catalog_xmin) but then don't adjust it for > > > 'max_slot_xid_age'. I could be missing something in this but it is > > > better to keep discussing this and try to move with another parameter > > > 'inactive_replication_slot_timeout' which according to me can be kept > > > at slot level instead of a GUC but OTOH we need to see the arguments > > > on both side and then decide which makes more sense. > > > > Hm. Are you suggesting inactive_timeout to be a slot level parameter > > similar to 'failover' property added recently by > > c393308b69d229b664391ac583b9e07418d411b6 and > > 73292404370c9900a96e2bebdc7144f7010339cf? With this approach, one can > > set inactive_timeout while creating the slot either via > > pg_create_physical_replication_slot() or > > pg_create_logical_replication_slot() or CREATE_REPLICATION_SLOT or > > ALTER_REPLICATION_SLOT command, and postgres tracks the > > last_inactive_at for every slot based on which the slot gets > > invalidated. If this understanding is right, I can go ahead and work > > towards it. > > > > Yeah, I have something like that in mind. You can prepare the patch > but it would be good if others involved in this thread can also share > their opinion. I think it makes sense to put the inactive_timeout granularity at the slot level (as the activity could vary a lot say between one slot linked to a subcription and one linked to some plugins). As far max_slot_xid_age I've the feeling that a new GUC is good enough. > > Alternatively, we can go the route of making GUC a list of key-value > > pairs of {slot_name, inactive_timeout}, but this kind of GUC for > > setting slot level parameters is going to be the first of its kind, so > > I'd prefer the above approach. > > > > I would prefer a slot-level parameter in this case rather than a GUC. Yeah, same here. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Autogenerate some wait events code and documentation
On 2023-Aug-28, Michael Paquier wrote: > I have looked again at that, and switching wait_event_names.txt to use > two columns made of the typedef definitions and the docs like is not a > problem: > FOO_BAR "Waiting on Foo Bar." > > WAIT_EVENT_ is appended to the typedef definitions in the script. The > wait event names like "FooBar" are generated from the enums by > splitting using their underscores and doing some lc(). Lock and > LWLock don't need to change. This way, it is easy to grep the wait > events from the source code and match them with wait_event_names.txt. FTR I had a rather unpleasant time last week upon finding a wait event named PgSleep. If you grep for that, there are no matches at all; and I spent ten minutes (for real) trying to figure out where that was coming from, until I remembered this thread. Now you have to guess that not only random lowercasing is happening, but also underscore removal. This is not a good developer experience and I think we should rethink this choice. It would be infinitely more usable, and not one bit more difficult, to make these lines be WAIT_EVENT_FOO_BAR FooBar "Waiting on Foo Bar." then there is no guessing involved. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "The important things in the world are problems with society that we don't understand at all. The machines will become more complicated but they won't be more complicated than the societies that run them."(Freeman Dyson)
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote: > I've also responded to Bertrand's comments here. Thanks! > > On Wed, Mar 6, 2024 at 3:56 PM Bertrand Drouvot > wrote: > > > > 5 === > > > > -# Check conflict_reason is NULL for physical slot > > +# Check invalidation_reason is NULL for physical slot > > $res = $node_primary->safe_psql( > > 'postgres', qq[ > > -SELECT conflict_reason is null FROM pg_replication_slots > > where slot_name = '$primary_slotname';] > > +SELECT invalidation_reason is null FROM > > pg_replication_slots where slot_name = '$primary_slotname';] > > ); > > > > > > I don't think this test is needed anymore: it does not make that much sense > > since > > it's done after the primary database initialization and startup. > > It is now turned into a test verifying 'conflicting boolean' is null > for the physical slot. Isn't that okay? Yeah makes more sense now, thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Autogenerate some wait events code and documentation
Hi, On Mon, Mar 18, 2024 at 08:02:24AM +0900, Michael Paquier wrote: > On Sun, Mar 17, 2024 at 11:31:14AM -0700, Noah Misch wrote: > > Adding an event > > will renumber others, which can make an extension report the wrong event > > until > > recompiled. Extensions citus, pg_bulkload, and vector refer to static > > events. > > If a back-patch added WAIT_EVENT_MESSAGE_QUEUE_SOMETHING_NEW, an old-build > > pg_bulkload report of WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN would show up in > > pg_stat_activity as WAIT_EVENT_PARALLEL_BITMAP_SCAN. (WAIT_EVENT_EXTENSION > > is > > not part of a generated enum, fortunately.) Nice catch, thanks! > I see an option (4), similar to your (2) without the per-line > annotation: add a new magic keyword like the existing "Section" that > is used in the first lines of generate-wait_event_types.pl where we > generate tab-separated lines with the section name as prefix of each > line. So I can think of something like: > Section: ClassName - WaitEventFoo > FOO_1 "Waiting in foo1" > FOO_2 "Waiting in foo2" > Backpatch: > BAR_1 "Waiting in bar1" > BAR_2 "Waiting in bar2" > > Then force the ordering for the docs and keep the elements in the > backpatch section at the end of the enums in the order in the txt. Yeah I think that's a good idea. > One thing that could make sense is to enforce that "Backpatch" is at > the end of a section, meaning that we would need a second keyword like > a "Section: EndBackpatch" or something like that. That's not strictly > necessary IMO as the format of the txt is easy to follow. I gave it a try in the POC patch attached. I did not use a "EndBackpatch" section to keep the perl script as simple a possible though (but documented the expectation instead). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From f93412201df9fa9156cee0b2492a25ac89ff0921 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 18 Mar 2024 08:34:19 + Subject: [PATCH v1] Add a "Backpatch" section in wait_event_names.txt When backpatching, adding an event will renumber others, which can make an extension report the wrong event until recompiled. This is due to the fact that generate-wait_event_types.pl sorts events to position them. The "Backpatch" section added here ensures no ordering is done for the wait events found after this delimiter. --- .../activity/generate-wait_event_types.pl | 26 ++- .../utils/activity/wait_event_names.txt | 10 +-- 2 files changed, 33 insertions(+), 3 deletions(-) 100.0% src/backend/utils/activity/ diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl index f1adf0e8e7..d129d94889 100644 --- a/src/backend/utils/activity/generate-wait_event_types.pl +++ b/src/backend/utils/activity/generate-wait_event_types.pl @@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously" open my $wait_event_names, '<', $ARGV[0] or die; +my @backpatch_lines; my @lines; +my $backpatch = 0; my $section_name; my $note; my $note_name; @@ -59,10 +61,26 @@ while (<$wait_event_names>) { $section_name = $_; $section_name =~ s/^.*- //; + $backpatch = 0; next; } - push(@lines, $section_name . "\t" . $_); + # Are we dealing with backpatch wait events? + if (/^Backpatch:$/) + { + $backpatch = 1; + next; + } + + # Backpatch wait events won't be sorted during code generation + if ($gen_code && $backpatch) + { + push(@backpatch_lines, $section_name . "\t" . $_); + } + else + { + push(@lines, $section_name . "\t" . $_); + } } # Sort the lines based on the second column. @@ -70,6 +88,12 @@ while (<$wait_event_names>) my @lines_sorted = sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines; +# If we are generating code then concat @lines_sorted and @backpatch_lines. +if ($gen_code) +{ + push(@lines_sorted, @backpatch_lines); +} + # Read the sorted lines and populate the hash table foreach my $line (@lines_sorted) { diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index c08e00d1d6..62c835df2e 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -24,7 +24,12 @@ # SGML tables of wait events for inclusion in the documentation. # # When adding a new wait event, make sure it is placed in the appropriate -# ClassName section. +# ClassName section. If the wait event is backpatched to a version < 17 then +# put it under a "Backpatch" delimiter at the end of the related ClassName +# section. +# Ensure that the wait events under the "Backpatch" delimiter are placed in the +# same order as in the pre 17 wait_event_types.h (see VERSION_FILE_SYNC as an +# example). # # WaitEventLWLock and WaitEventLock have their own C code for their wait event # enums and function names. Hence, for
Re: Skip collecting decoded changes of already-aborted transactions
On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian wrote: > > > > On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada wrote: >> >> >> I resumed working on this item. I've attached the new version patch. >> >> I rebased the patch to the current HEAD and updated comments and >> commit messages. The patch is straightforward and I'm somewhat >> satisfied with it, but I'm thinking of adding some tests for it. >> >> Regards, >> >> -- >> Masahiko Sawada >> Amazon Web Services: https://aws.amazon.com > > > I just had a look at the patch, the patch no longer applies because of a > removal of a header in a recent commit. Overall the patch looks fine, and I > didn't find any issues. Some cosmetic comments: Thank you for your review comments. > in ReorderBufferCheckTXNAbort() > + /* Quick return if we've already knew the transaction status */ > + if (txn->aborted) > + return true; > > knew/know Maybe it should be "known"? > > /* > + * If logical_replication_mode is "immediate", we don't check the > + * transaction status so the caller always process this transaction. > + */ > + if (debug_logical_replication_streaming == > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE) > + return false; > > /process/processes > Fixed. In addition to these changes, I've made some changes to the latest patch. Here is the summary: - Use txn_flags field to record the transaction status instead of two 'committed' and 'aborted' flags. - Add regression tests. - Update commit message. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com v4-0001-Skip-logical-decoding-of-already-aborted-transact.patch Description: Binary data
Re: Check lateral references within PHVs for memoize cache keys
On Fri, Feb 2, 2024 at 5:18 PM Richard Guo wrote: > The v4 patch does not apply any more. I've rebased it on master. > Nothing else has changed. > Here is another rebase over master so it applies again. I also added a commit message to help review. Nothing else has changed. Thanks Richard v6-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch Description: Binary data
Re: Support json_errdetail in FRONTEND builds
> On 18 Mar 2024, at 01:13, Tom Lane wrote: > > Daniel Gustafsson writes: >> I took another look at this tonight and committed it with some mostly >> cosmetic >> changes. Since then, tamandua failed the SSL test on this commit but I am >> unable to reproduce any error both on older OpenSSL and matching the 3.1 >> version on tamandua, so not sure if its related. > > That failure looks like just a random buildfarm burp: Indeed, and it corrected itself a few hours later. -- Daniel Gustafsson
Re: What about Perl autodie?
> On 18 Mar 2024, at 07:27, Peter Eisentraut wrote: > After some pondering, I figured the exclude list is better. Agreed. > So here is a squashed patch, also with a complete commit message. Looks good from a read-through. It would have been nice to standardize on using one of "|| die" and "or die" consistently but that's clearly not for this body of work. -- Daniel Gustafsson
Re: Remove a FIXME and unused variables in Meson
On 14.03.24 06:13, Tristan Partin wrote: One of them adds version gates to two LLVM flags (-frwapv, -fno-strict-aliasing). I believe we moved the minimum LLVM version recently, so these might not be necessary, but maybe it helps for historictal reasons. If not, I'll just remove the comment in a different patch. We usually remove version gates once the overall minimum required version is new enough. So this doesn't seem like a step in the right direction. Second patch removes some unused variables. Were they analogous to things in autotools and the Meson portions haven't been added yet? Hmm, yeah, no idea. These were not used even in the first commit for Meson support. Might have had a purpose in earlier draft patches.
Re: MERGE ... RETURNING
On Fri, 15 Mar 2024 at 17:14, Jeff Davis wrote: > > On Fri, 2024-03-15 at 11:20 +, Dean Rasheed wrote: > > > So barring any further objections, I'd like to go ahead and get this > > patch committed. > > I like this feature from a user perspective. So +1 from me. > I have committed this. Thanks for all the feedback everyone. Regards, Dean
Re: remaining sql/json patches
I have tested a nested case but why is the negative number allowed in subscript(NESTED '$.phones[-1]'COLUMNS), it should error out if the number is negative. ‘postgres[170683]=#’SELECT * FROM JSON_TABLE(jsonb '{ ‘...>’ "id" : "0.234567897890", ‘...>’ "name" : { "first":"John", "last":"Doe" }, ‘...>’ "phones" : [{"type":"home", "number":"555-3762"}, ‘...>’ {"type":"work", "number":"555-7252", "test":123}]}', ‘...>’'$' ‘...>’COLUMNS( ‘...>’ id numeric(2,2) PATH 'lax $.id', ‘...>’ last_name varCHAR(10) PATH 'lax $.name.last', first_name VARCHAR(10) PATH 'lax $.name.first', ‘...>’ NESTED '$.phones[-1]'COLUMNS ( ‘...>’"type" VARCHAR(10), ‘...>’"number" VARCHAR(10) ‘...>’ ) ‘...>’ ) ‘...>’ ) as t; id | last_name | first_name | type | number --+---++--+ 0.23 | Doe | Johnnn | | (1 row) Thanks, Himanshu
Re: Catalog domain not-null constraints
Anyway, in order to move this forward, here is an updated patch where the ADD CONSTRAINT ... NOT NULL behavior for domains matches the idempotent behavior of tables. This uses the patch that Jian He posted. From a0075e4bcd5f2db292f92fc2b70576ccebd07210 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 23 Nov 2023 07:35:32 +0100 Subject: [PATCH v4 1/2] Add tests for domain-related information schema views Discussion: https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org --- src/test/regress/expected/domain.out | 47 src/test/regress/sql/domain.sql | 24 ++ 2 files changed, 71 insertions(+) diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out index 6d94e84414a..e70aebd70c0 100644 --- a/src/test/regress/expected/domain.out +++ b/src/test/regress/expected/domain.out @@ -1207,3 +1207,50 @@ create domain testdomain1 as int constraint unsigned check (value > 0); alter domain testdomain1 rename constraint unsigned to unsigned_foo; alter domain testdomain1 drop constraint unsigned_foo; drop domain testdomain1; +-- +-- Information schema +-- +SELECT * FROM information_schema.column_domain_usage + WHERE domain_name IN ('con', 'dom', 'pos_int', 'things') + ORDER BY domain_name; + domain_catalog | domain_schema | domain_name | table_catalog | table_schema | table_name | column_name ++---+-+---+--++- + regression | public| con | regression| public | domcontest | col1 + regression | public| dom | regression| public | domview| col1 + regression | public| things | regression| public | thethings | stuff +(3 rows) + +SELECT * FROM information_schema.domain_constraints + WHERE domain_name IN ('con', 'dom', 'pos_int', 'things') + ORDER BY constraint_name; + constraint_catalog | constraint_schema | constraint_name | domain_catalog | domain_schema | domain_name | is_deferrable | initially_deferred ++---+-++---+-+---+ + regression | public| con_check | regression | public| con | NO| NO + regression | public| meow| regression | public| things | NO| NO + regression | public| pos_int_check | regression | public| pos_int | NO| NO +(3 rows) + +SELECT * FROM information_schema.domains + WHERE domain_name IN ('con', 'dom', 'pos_int', 'things') + ORDER BY domain_name; + domain_catalog | domain_schema | domain_name | data_type | character_maximum_length | character_octet_length | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | domain_default | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximum_cardinality | dtd_identifier ++---+-+---+--++---+--++---+--++---+-+---++---+++-++--+---+--++-+ + regression | public| con | integer | || | | | | || 32 | 2 | 0 || ||| regression | pg_catalog | int4 | | || | 1 + regression | public| dom | integer | || | | | | || 32 | 2 | 0 || ||| regression | pg_catalog | int4 | | || | 1 + regression | public| pos_int | integer | || | | | | |
Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
On Thu, 14 Mar 2024 11:10:42 -0500 Nathan Bossart wrote: > On Wed, Mar 13, 2024 at 01:09:18PM +0700, Yugo NAGATA wrote: > > On Tue, 12 Mar 2024 22:07:17 -0500 > > Nathan Bossart wrote: > >> I did some light editing to prepare this for commit. > > > > Thank you. I confirmed the test you improved and I am fine with that. > > Committed. Thank you! > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com -- Yugo NAGATA
Re: speed up a logical replica setup
On Sat, 16 Mar 2024 at 21:13, Euler Taveira wrote: > > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote: > > Did you consider adding options for publication/subscription/slot > names as mentioned in my previous email? As discussed in a few emails > above, it would be quite confusing for users to identify the logical > replication objects once the standby is converted to subscriber. > > > Yes. I was wondering to implement after v1 is pushed. I started to write a > code > for it but I wasn't sure about the UI. The best approach I came up with was > multiple options in the same order. (I don't provide short options to avoid > possible portability issues with the order.) It means if I have 3 databases > and > the following command-line: > > pg_createsubscriber ... --database pg1 --database pg2 --database3 > --publication > pubx --publication puby --publication pubz > > pubx, puby and pubz are created in the database pg1, pg2, and pg3 > respectively. > > It seems we care only for publications created on the primary. Isn't > it possible that some of the publications have been replicated to > standby by that time, for example, in case failure happens after > creating a few publications? IIUC, we don't care for standby cleanup > after failure because it can't be used for streaming replication > anymore. So, the only choice the user has is to recreate the standby > and start the pg_createsubscriber again. This sounds questionable to > me as to whether users would like this behavior. Does anyone else have > an opinion on this point? > > > If it happens after creating a publication and before promotion, the cleanup > routine will drop the publications on primary and it will eventually be > applied > to the standby via replication later. > > I see the below note in the patch: > +If pg_createsubscriber fails while processing, > +then the data directory is likely not in a state that can be recovered. > It > +is true if the target server was promoted. In such a case, creating a new > +standby server is recommended. > > By reading this it is not completely clear whether the standby is not > recoverable in case of any error or only an error after the target > server is promoted. If others agree with this behavior then we should > write the detailed reason for this somewhere in the comments as well > unless it is already explained. > > > I rewrote the sentence to make it clear that only if the server is promoted, > the target server will be in a state that cannot be reused. It provides a > message saying it too. > > pg_createsubscriber: target server reached the consistent state > pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you > must recreate the physical replica before continuing. > > I'm attaching a new version (v30) that adds: > > * 3 new options (--publication, --subscription, --replication-slot) to assign > names to the objects. The --database option used to ignore duplicate names, > however, since these new options rely on the number of database options to > match the number of object name options, it is forbidden from now on. The > duplication is also forbidden for the object names to avoid errors earlier. > * rewrite the paragraph related to unusuable target server after > pg_createsubscriber fails. 1) Maximum size of the object name is 64, we can have a check so that we don't specify more than the maximum allowed length: + case 3: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_replslots++; + } + else + { + pg_log_error("duplicate replication slot \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3" In this case creation of replication slot will fail: pg_createsubscriber: error: could not create replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on database "db2": ERROR: replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" already exists 2) Similarly here too: + case 4: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_subs++; + } + else + { + pg_log_error("duplicate subscription \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
Re: Improve readability by using designated initializers when possible
On 14.03.24 01:26, Michael Paquier wrote: -EventTriggerSupportsObjectClass(ObjectClass objclass) +EventTriggerSupportsObject(const ObjectAddress *object) The shortcut introduced here is interesting, but it is inconsistent. HEAD treats OCLASS_SUBSCRIPTION as something supported by event triggers, but as pg_subscription is a shared catalog it would be discarded with your change. Subscriptions are marked as supported in the event trigger table: https://www.postgresql.org/docs/devel/event-trigger-matrix.html Ah, good catch. Subscriptions are a little special there. Here is a new patch that keeps the switch/case arrangement in that function. That also makes it easier to keep the two EventTriggerSupports... functions aligned. Also added a note about subscriptions and a reference to the documentation. From 3a17f075f9ccac2eb7a969be5e455bb29a9e40e2 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 18 Mar 2024 08:07:37 +0100 Subject: [PATCH v4] Remove ObjectClass ObjectClass is an enum whose values correspond to catalog OIDs. But the extra layer of redirection, which is used only in small parts of the code, and the similarity to ObjectType, are confusing and cumbersome. One advantage has been that some switches processing the OCLASS enum don't have "default:" cases. This is so that the compiler tells us when we fail to add support for some new object class. But you can also handle that with some assertions and proper test coverage. It's not even clear how strong this benefit is. For example, in AlterObjectNamespace_oid(), you could still put a new OCLASS into the "ignore object types that don't have schema-qualified names" case, and it might or might not be wrong. Also, there are already various OCLASS switches that do have a default case, so it's not even clear what the preferred coding style should be. Discussion: https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com --- src/backend/catalog/dependency.c | 239 src/backend/catalog/objectaddress.c | 316 --- src/backend/commands/alter.c | 73 ++- src/backend/commands/event_trigger.c | 130 ++- src/backend/commands/tablecmds.c | 62 ++ src/include/catalog/dependency.h | 51 - src/include/commands/event_trigger.h | 2 +- src/tools/pgindent/typedefs.list | 1 - 8 files changed, 233 insertions(+), 641 deletions(-) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index eadcf6af0df..6e8f6a57051 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -206,7 +206,7 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel, if (extra->flags & DEPFLAG_REVERSE) normal = true; - if (EventTriggerSupportsObjectClass(getObjectClass(thisobj))) + if (EventTriggerSupportsObject(thisobj)) { EventTriggerSQLDropAddObject(thisobj, original, normal); } @@ -1349,9 +1349,9 @@ deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags) static void doDeletion(const ObjectAddress *object, int flags) { - switch (getObjectClass(object)) + switch (object->classId) { - case OCLASS_CLASS: + case RelationRelationId: { charrelKind = get_rel_relkind(object->objectId); @@ -1382,104 +1382,102 @@ doDeletion(const ObjectAddress *object, int flags) break; } - case OCLASS_PROC: + case ProcedureRelationId: RemoveFunctionById(object->objectId); break; - case OCLASS_TYPE: + case TypeRelationId: RemoveTypeById(object->objectId); break; - case OCLASS_CONSTRAINT: + case ConstraintRelationId: RemoveConstraintById(object->objectId); break; - case OCLASS_DEFAULT: + case AttrDefaultRelationId: RemoveAttrDefaultById(object->objectId); break; - case OCLASS_LARGEOBJECT: + case LargeObjectRelationId: LargeObjectDrop(object->objectId); break; - case OCLASS_OPERATOR: + case OperatorRelationId: RemoveOperatorById(object->objectId); break; - case OCLASS_REWRITE: + case RewriteRelationId: RemoveRewriteRuleById(object->objectId); break; -
RE: PostgreSQL 17 Release Management Team & Feature Freeze
Dear Michael, > If you think that this is OK, and as far as I can see this looks OK on > the thread, then this open item should be moved under "resolved before > 17beta1", mentioning the commit involved in the fix. Please see [1] > for examples. OK, I understood that I could wait checking from you. Thanks. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: What about Perl autodie?
On Mon, Mar 18, 2024 at 2:28 AM Peter Eisentraut wrote: > On 21.02.24 08:26, Peter Eisentraut wrote: > > On 14.02.24 17:52, Peter Eisentraut wrote: > >> A gentler way might be to start using some perlcritic policies like > >> InputOutput::RequireCheckedOpen or the more general > >> InputOutput::RequireCheckedSyscalls and add explicit error checking at > >> the sites it points out. > > > > Here is a start for that. I added the required stanza to perlcriticrc > > and started with an explicit list of functions to check: > > > > functions = chmod flock open read rename seek symlink system > > > > and fixed all the issues it pointed out. > > > > I picked those functions because most existing code already checked > > those, so the omissions are probably unintended, or in some cases also > > because I thought it would be important for test correctness (e.g., some > > tests using chmod). > > > > I didn't design any beautiful error messages, mostly just used "or die > > $!", which mostly matches existing code, and also this is > > developer-level code, so having the system error plus source code > > reference should be ok. > > > > In the second patch, I changed the perlcriticrc stanza to use an > > exclusion list instead of an explicit inclusion list. That way, you can > > see what we are currently *not* checking. I'm undecided which way > > around is better, and exactly what functions we should be checking. (Of > > course, in principle, all of them, but since this is test and build > > support code, not production code, there are probably some reasonable > > compromises to be made.) > > After some pondering, I figured the exclude list is better. So here is > a squashed patch, also with a complete commit message. > > Btw., do we check perlcritic in an automated way, like on the buildfarm? Yes. crake and koel do. cheers andrew
Re: What about Perl autodie?
On 21.02.24 08:26, Peter Eisentraut wrote: On 14.02.24 17:52, Peter Eisentraut wrote: A gentler way might be to start using some perlcritic policies like InputOutput::RequireCheckedOpen or the more general InputOutput::RequireCheckedSyscalls and add explicit error checking at the sites it points out. Here is a start for that. I added the required stanza to perlcriticrc and started with an explicit list of functions to check: functions = chmod flock open read rename seek symlink system and fixed all the issues it pointed out. I picked those functions because most existing code already checked those, so the omissions are probably unintended, or in some cases also because I thought it would be important for test correctness (e.g., some tests using chmod). I didn't design any beautiful error messages, mostly just used "or die $!", which mostly matches existing code, and also this is developer-level code, so having the system error plus source code reference should be ok. In the second patch, I changed the perlcriticrc stanza to use an exclusion list instead of an explicit inclusion list. That way, you can see what we are currently *not* checking. I'm undecided which way around is better, and exactly what functions we should be checking. (Of course, in principle, all of them, but since this is test and build support code, not production code, there are probably some reasonable compromises to be made.) After some pondering, I figured the exclude list is better. So here is a squashed patch, also with a complete commit message. Btw., do we check perlcritic in an automated way, like on the buildfarm?From c70c6af0e369496bec04d20dbe42d09a233f046f Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Sun, 17 Mar 2024 16:10:50 +0100 Subject: [PATCH v2] Activate perlcritic InputOutput::RequireCheckedSyscalls and fix resulting warnings This checks that certain I/O-related Perl functions properly check their return value. Some parts of the PostgreSQL code had been a bit sloppy about that. The new perlcritic warnings are fixed here. I didn't design any beautiful error messages, mostly just used "or die $!", which mostly matches existing code, and also this is developer-level code, so having the system error plus source code reference should be ok. Initially, we only activate this check for a subset of what the perlcritic check would warn about. The effective list is chmod flock open read rename seek symlink system The initial set of functions is picked because most existing code already checked the return value of those, so any omissions are probably unintended, or because it seems important for test correctness. The actual perlcritic configuration is written as an exclude list. That seems better so that we are clear on what we are currently not checking. Maybe future patches want to investigate checking some of the other functions. (In principle, we might eventually want to check all of them, but since this is test and build support code, not production code, there are probably some reasonable compromises to be made.) Discussion: https://www.postgresql.org/message-id/flat/88b7d4f2-46d9-4cc7-b1f7-613c90f9a76a%40eisentraut.org --- .../t/010_pg_archivecleanup.pl | 2 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl| 8 src/bin/pg_ctl/t/001_start_stop.pl | 2 +- src/bin/pg_resetwal/t/002_corrupted.pl | 2 +- src/bin/pg_rewind/t/009_growing_files.pl| 2 +- src/bin/pg_rewind/t/RewindTest.pm | 4 ++-- src/pl/plperl/text2macro.pl | 4 ++-- src/test/kerberos/t/001_auth.pl | 2 +- .../ssl_passphrase_callback/t/001_testfunc.pl | 2 +- src/test/perl/PostgreSQL/Test/Cluster.pm| 12 ++-- src/test/perl/PostgreSQL/Test/Utils.pm | 16 src/test/ssl/t/SSL/Server.pm| 10 +- src/tools/msvc_gendef.pl| 4 ++-- src/tools/perlcheck/perlcriticrc| 8 src/tools/pgindent/pgindent | 17 + 15 files changed, 52 insertions(+), 43 deletions(-) diff --git a/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl b/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl index 792f5677c87..91a98c71e99 100644 --- a/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl +++ b/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl @@ -36,7 +36,7 @@ sub create_files { foreach my $fn (map { $_->{name} } @_) { - open my $file, '>', "$tempdir/$fn"; + open my $file, '>', "$tempdir/$fn" or die $!; print $file 'CONTENT'; close $file; diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index d3c83f26e4b..490a9822f09 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++
Re: Weird test mixup
On Mon, Mar 18, 2024 at 10:50:25AM +0500, Andrey M. Borodin wrote: > Maybe consider function injection_points_attach_local(‘point name’) > instead of static switch? > Or even injection_points_attach_global(‘point name’), while function > injection_points_attach(‘point name’) will be global? This would > favour writing concurrent test by default… The point is to limit accidents like the one of this thread. So, for cases already in the tree, not giving the point name in the SQL function would be simple enough. What you are suggesting can be simply done, as well, though I'd rather wait for a reason to justify doing so. -- Michael signature.asc Description: PGP signature
Re: Switching XLog source from archive to streaming when primary available
On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote: > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a > conflict in meson.build. Please see the attached v23 patch. I've been reading this patch, and this is a very tricky one. Please be *very* cautious. +#streaming_replication_retry_interval = 0# time after which standby +# attempts to switch WAL source from archive to +# streaming replication in seconds; 0 disables This stuff allows a minimal retry interval of 1s. Could it be useful to have more responsiveness here and allow lower values than that? Why not switching the units to be milliseconds? +if (streaming_replication_retry_interval <= 0 || +!StandbyMode || +currentSource != XLOG_FROM_ARCHIVE) +return SWITCH_TO_STREAMING_NONE; Hmm. Perhaps this should mention why we don't care about the consistent point. +/* See if we can switch WAL source to streaming */ +if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE) +wal_source_switch_state = SwitchWALSourceToPrimary(); Rather than a routine that returns as result the value to use for the GUC, I'd suggest to let this routine set the GUC as there is only one caller of SwitchWALSourceToPrimary(). This can also include a check on SWITCH_TO_STREAMING_NONE, based on what I'm reading that. - if (lastSourceFailed) + if (lastSourceFailed || + wal_source_switch_state == SWITCH_TO_STREAMING) Hmm. This one may be tricky. I'd recommend a separation between the failure in reading from a source and the switch to a new "forced" source. +if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) +readFrom = XLOG_FROM_PG_WAL; +else +readFrom = currentSource == XLOG_FROM_ARCHIVE ? +XLOG_FROM_ANY : currentSource; WALSourceSwitchState looks confusing here, and are you sure that this is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read from the archives even with a streaming pending. By the way, I am not convinced that what you have is the best interface ever. This assumes that we'd always want to switch to streaming more aggressively. Could there be a point in also controlling if we should switch to pg_wal/ or just to archiving more aggressively as well, aka be able to do the opposite switch of WAL source? This design looks somewhat limited to me. The origin of the issue is that we don't have a way to control the order of the sources consumed by WAL replay. Perhaps something like a replay_source_order that uses a list would be better, with elements settable to archive, pg_wal and streaming? -- Michael signature.asc Description: PGP signature