Impact of checkpointer during pg_upgrade
During pg_upgrade, we start the server for the old cluster which can allow the checkpointer to remove the WAL files. It has been noticed that we do generate certain types of WAL records (e.g XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) even during pg_upgrade for old cluster, so additional WAL records could let checkpointer decide that certain WAL segments can be removed (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate the slots. Currently, I can't see any problem with this but for future work where we want to migrate logical slots during an upgrade[1], we need to decide what to do for such cases. The initial idea we had was that if the old cluster has some invalid slots, we won't allow an upgrade unless the user removes such slots or uses some option like --exclude-slots. It is quite possible that slots got invalidated during pg_upgrade due to no user activity. Now, even though the possibility of the same is less I think it is worth considering what should be the behavior. The other possibilities apart from not allowing an upgrade in such a case could be (a) Before starting the old cluster, we fetch the slots directly from the disk using some tool like [2] and make the decisions based on that state; (b) During the upgrade, we don't allow WAL to be removed if it can invalidate slots; (c) Copy/Migrate the invalid slots as well but for that, we need to expose an API to invalidate the slots; (d) somehow distinguish the slots that are invalidated during an upgrade and then simply copy such slots because anyway we ensure that all the WAL required by slot is sent before shutdown. Thoughts? [1] - https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Extract numeric filed in JSONB more effectively
I think the last patch failed. I am not 100% sure. https://cirrus-ci.com/task/5464366154252288 says "Created 21 hours ago", I assume the latest patch. the diff in Artifacts section. you can go to testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/jsonb.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/jsonb.out --- /tmp/cirrus-ci-build/src/test/regress/expected/jsonb.out 2023-09-01 03:34:43.585036700 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/jsonb.out 2023-09-01 03:39:05.800452844 + @@ -528,7 +528,7 @@ (3 rows) SELECT (test_json -> 'field1')::int4 FROM test_jsonb WHERE json_type = 'object'; -ERROR: cannot cast jsonb string to type integer +ERROR: unknown jsonb type: 1125096840 SELECT (test_json -> 'field1')::bool FROM test_jsonb WHERE json_type = 'object'; ERROR: cannot cast jsonb string to type boolean \pset null ''
Re: Eager page freeze criteria clarification
On Fri, Sep 1, 2023 at 12:34 PM Robert Haas wrote: > If the table > is being vacuumed frequently, then the last-vacuum-LSN will be newer, > which means we'll freeze *more* aggressively. Sort of like how if you set autovacuum_vacuum_scale_factor low enough, with standard pgbench, with heap fill factor tuned, autovacuum will never think that the table doesn't need to be vacuumed. It will continually see enough dead heap-only tuples to get another autovacuum each time. Though there won't be any LP_DEAD items at any point -- regardless of when VACUUM actually runs. When I reported this a couple of years ago, I noticed that autovacuum would spin whenever I set autovacuum_vacuum_scale_factor to 0.02. But autovacuum would *never* run (outside of antiwraparound autovacuums) when it was set just a little higher (perhaps 0.03 or 0.04). So there was some inflection point at which its behavior totally changed. > And I'm not sure why we > want to do that. If the table is being vacuumed a lot, it's probably > also being modified a lot, which suggests that we ought to be more > cautious about freezing, rather than the reverse. Why wouldn't it be both things at the same time, for the same table? Why not also avoid setting pages all-visible? The WAL records aren't too much smaller than most freeze records these days -- 64 bytes on most systems. I realize that the rules for FPIs are a bit different when page-level checksums aren't enabled, but fundamentally it's the same situation. No? -- Peter Geoghegan
Re: Why doesn't Vacuum FULL update the VM
On Fri, Sep 1, 2023 at 12:34 PM Melanie Plageman wrote: > I don't see why the visibility map shouldn't be updated so that all of > the pages show all visible and all frozen for this relation after the > vacuum full. There was a similar issue with COPY FREEZE. It was fixed relatively recently -- see commit 7db0cd21. -- Peter Geoghegan
Re: SQL:2011 application time
On 9/1/23 21:56, Paul Jungwirth wrote: On 9/1/23 03:50, Vik Fearing wrote: On 9/1/23 11:30, Peter Eisentraut wrote: 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT OVERLAPS clause attach to the last column, or to the whole column list? In the SQL standard, you can only have one period and it has to be listed last, so this question does not arise. But here we are building a more general facility to then build the SQL facility on top of. So I think it doesn't make sense that the range column must be last or that there can only be one. Also, your implementation requires at least one non-overlaps column, which also seems like a confusing restriction. I think the WITHOUT OVERLAPS clause should be per-column, so that something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would be possible. Then the WITHOUT OVERLAPS clause would directly correspond to the choice between equality or overlaps operator per column. An alternative interpretation would be that WITHOUT OVERLAPS applies to the whole column list, and we would take it to mean, for any range column, use the overlaps operator, for any non-range column, use the equals operator. But I think this would be confusing and would prevent the case of using the equality operator for some ranges and the overlaps operator for some other ranges in the same key. I prefer the first option. That is: WITHOUT OVERLAPS applies only to the column or expression it is attached to, and need not be last in line. I agree. The second option seems confusing and is more restrictive. I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any position) is a great recommendation that enables a lot of new functionality. Several books[1,2] about temporal databases describe a multi-dimensional temporal space (even beyond application time vs. system time), and the standard is pretty disappointing here. It's not a weird idea. But I just want to be explicit that this isn't something the standard describes. (I think everyone in the conversation so far understands that.) So far I've tried to be pretty scrupulous about following SQL:2011, although personally I'd rather see Postgres support this functionality. And it's not like it goes *against* what the standard says. But if there are any objections, I'd love to hear them before putting in the work. :-) I have no problem with a first version doing exactly what the standard says and expanding it later. If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE constraints, then surely we also allow multiple+anywhere PERIOD in FOREIGN KEY constraints too. (I guess the standard switched keywords because a FK is more like "MUST OVERLAPS". :-) Seems reasonable. Also if you have multiple application-time dimensions we probably need to allow multiple FOR PORTION OF clauses. I think the syntax would be: UPDATE t FOR PORTION OF valid_at FROM ... TO ... FOR PORTION OF asserted_at FROM ... TO ... [...] SET foo = bar Does that sound okay? That sounds really cool. [1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL. 2nd edition, 2014. [2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014. Thanks! I have ordered these books. -- Vik Fearing
Re: Why doesn't Vacuum FULL update the VM
On 9/1/23 21:34, Melanie Plageman wrote: Hi, I noticed that VACUUM FULL actually does freeze the tuples in the rewritten table (heap_freeze_tuple()) but then it doesn't mark them all visible or all frozen in the visibility map. I don't understand why. It seems like it would save us future work. I have often wondered this as well, but obviously I haven't done anything about it. I don't see why the visibility map shouldn't be updated so that all of the pages show all visible and all frozen for this relation after the vacuum full. It cannot just blindly mark everything all visible and all frozen because it will copy over dead tuples that concurrent transactions are still allowed to see. -- Vik Fearing
Re: sandboxing untrusted code
On Fri, 2023-09-01 at 09:12 -0400, Robert Haas wrote: > Close but not quite. As you say, #2 does exercise privileges. Also, > even if no privileges are exercised, you could still refer to > CURRENT_ROLE, and I think you could also call a function like > has_table_privilege. Your identity hasn't changed, but you're > restricted from exercising some of your privileges. Really, you still > have them, but they're just not available to you in that situation. Which privileges are available in a sandboxed environment, exactly? Is it kind of like masking away all privileges except EXECUTE, or are other privileges available, like SELECT? And the distinction that you are drawing between having the privileges but them (mostly) not being available, versus not having the privileges at all, is fairly subtle. Some examples showing why that distinction is important would be helpful. > > > Although your proposal sounds like a good security backstop, it > > feels > > like it's missing the point that there are different _kinds_ of > > functions. We already have the IMMUTABLE marker and we already have > > runtime checks to make sure that immutable functions can't CREATE > > TABLE; why not build on that mechanism or create new markers? ... > Here, however, we can't trust the owners > of functions to label those functions accurately. Of course, but observe: =# CREATE FUNCTION f(i INT) RETURNS INT IMMUTABLE LANGUAGE plpgsql AS $$ BEGIN CREATE TABLE x(t TEXT); RETURN 42 + i; END; $$; =# SELECT f(2); ERROR: CREATE TABLE is not allowed in a non-volatile function CONTEXT: SQL statement "CREATE TABLE x(t TEXT)" PL/pgSQL function f(integer) line 3 at SQL statement The function f() is called at the top level, not as part of any index expression or other special context. But it fails to CREATE TABLE simply because that's not an allowed thing for an IMMUTABLE function to do. That tells me right away that my function isn't going to work, and I can rewrite it rather than waiting for some other user to say that it failed when run in a sandbox. > It won't do for > Alice to create a function and then apply the NICE_AND_SAFE marker to > it. You can if you always execute NICE_AND_SAFE functions in a sandbox. The difference is that it's always executed in a sandbox, rather than sometimes, so it will fail consistently. > Now, in the case of a C function, things are a bit different. We > can't > inspect the generated machine code and know what the function does, > because of that pesky halting problem. We could handle that either > through function labeling, since only superusers can create C > functions, or by putting checks directly in the C code. I was > somewhat > inclined toward the latter approach, but I'm not completely sure yet > what makes sense. Thinking about your comments here made me realize > that there are other procedural languages to worry about, too, like > PL/python or PL/perl or PL/sh. Whatever we do for the C functions > will > have to be extended to those cases somehow as well. If we label > functions, then we'll have to allow superusers only to label > functions > in these languages as well and make the default label "this is > unsafe." If we put checks in the C code then I guess any given PL > needs to certify that it knows about sandboxing or have all of its > functions treated as unsafe. I think doing this at the C level would > be better, strictly speaking, because it's more granular. Imagine a > function that only conditionally does some prohibited action - it can > be allowed to work in the cases where it does not attempt the > prohibited operation, and blocked when it does. Labeling is > all-or-nothing. Here I'm getting a little lost in what you mean by "prohibited operation". Most languages mostly use SPI, and whatever sandboxing checks you do should work there, too. Are you talking about completely separate side effects like writing files or opening sockets? Regards, Jeff Davis
Re: Inefficiency in parallel pg_restore with many tables
On Fri, Sep 01, 2023 at 04:00:44PM -0400, Robert Haas wrote: > In hindsight, I think that making binaryheap depend on Datum was a bad > idea. I think that was my idea, and I think it wasn't very smart. > Considering that people have coded to that decision up until now, it > might not be too easy to change at this point. But in principle I > guess you'd want to be able to make a heap out of any C data type, > rather than just Datum, or just Datum in the backend and just void * > in the frontend. Yeah, something similar to simplehash for binary heaps could be nice. That being said, I don't know if there's a strong reason to specialize the implementation for a given C data type in most cases. I suspect many callers are just fine with dealing with pointers (e.g., I wouldn't store an entire TocEntry in the array), and smaller types like integers are already stored directly in the array thanks to the use of Datum. However, it _would_ allow us to abandon this frontend/backend void */Datum kludge, which is something. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Transaction timeout
On 12.01.23 20:46, Andrey Borodin wrote: On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote: I've rewritten this part to correctly report all timeouts that did happen. However there's now a tricky comma-formatting code which was tested only manually. I suspect this will make translation difficult. I use special functions for this like _() char* lock_reason = lock_timeout_occurred ? _("lock timeout") : ""; and then ereport(ERROR, (errcode(err_code), errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1, stmt_reason, comma2, tx_reason))); I hope it will be translatable... No, you can't do that. You have to write out all the strings separately.
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin wrote: > (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the > issue for us.) Maybe it'd be worth trying something stronger, like pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from reordering loads and stores as it goes, and ARM64 has weak memory ordering. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)
On Fri, Sep 1, 2023 at 11:47 AM Ranier Vilela wrote: > If a null locale is reached in these paths. > elog will dereference a null pointer. True. That's sloppy coding. I don't know enough about this code to be sure whether the error messages that you propose are for the best. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Fri, Sep 1, 2023 at 4:04 PM Jeff Davis wrote: > On Thu, 2023-08-31 at 17:17 -0400, Joe Conway wrote: > > Maybe move postgres_fdw to be a first class built in feature instead > > of > > an extension? > > That could make sense, but we still have to solve the problem of how to > present a built-in FDW. > > FDWs don't have a schema, so it can't be inside pg_catalog. So we'd > need some special logic somewhere to make pg_dump and psql \dew work as > expected, and I'm not quite sure what to do there. I'm worried that an approach based on postgres_fdw would have security problems. I think that we don't want postgres_fdw installed in every PostgreSQL cluster for security reasons. And I think that the set of people who should be permitted to manage connection strings for logical replication subscriptions could be different from the set of people who are entitled to use postgres_fdw. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Inefficiency in parallel pg_restore with many tables
On Tue, Jul 25, 2023 at 2:53 PM Nathan Bossart wrote: > On Mon, Jul 24, 2023 at 12:00:15PM -0700, Nathan Bossart wrote: > > Here is a sketch of this approach. It required fewer #ifdefs than I was > > expecting. At the moment, this one seems like the winner to me. > > Here is a polished patch set for this approach. I've also added a 0004 > that replaces the open-coded heap in pg_dump_sort.c with a binaryheap. > IMHO these patches are in decent shape. [ drive-by comment that hopefully doesn't cause too much pain ] In hindsight, I think that making binaryheap depend on Datum was a bad idea. I think that was my idea, and I think it wasn't very smart. Considering that people have coded to that decision up until now, it might not be too easy to change at this point. But in principle I guess you'd want to be able to make a heap out of any C data type, rather than just Datum, or just Datum in the backend and just void * in the frontend. -- Robert Haas EDB: http://www.enterprisedb.com
Re: SQL:2011 application time
On 9/1/23 03:50, Vik Fearing wrote: On 9/1/23 11:30, Peter Eisentraut wrote: 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT OVERLAPS clause attach to the last column, or to the whole column list? In the SQL standard, you can only have one period and it has to be listed last, so this question does not arise. But here we are building a more general facility to then build the SQL facility on top of. So I think it doesn't make sense that the range column must be last or that there can only be one. Also, your implementation requires at least one non-overlaps column, which also seems like a confusing restriction. I think the WITHOUT OVERLAPS clause should be per-column, so that something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would be possible. Then the WITHOUT OVERLAPS clause would directly correspond to the choice between equality or overlaps operator per column. An alternative interpretation would be that WITHOUT OVERLAPS applies to the whole column list, and we would take it to mean, for any range column, use the overlaps operator, for any non-range column, use the equals operator. But I think this would be confusing and would prevent the case of using the equality operator for some ranges and the overlaps operator for some other ranges in the same key. I prefer the first option. That is: WITHOUT OVERLAPS applies only to the column or expression it is attached to, and need not be last in line. I agree. The second option seems confusing and is more restrictive. I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any position) is a great recommendation that enables a lot of new functionality. Several books[1,2] about temporal databases describe a multi-dimensional temporal space (even beyond application time vs. system time), and the standard is pretty disappointing here. It's not a weird idea. But I just want to be explicit that this isn't something the standard describes. (I think everyone in the conversation so far understands that.) So far I've tried to be pretty scrupulous about following SQL:2011, although personally I'd rather see Postgres support this functionality. And it's not like it goes *against* what the standard says. But if there are any objections, I'd love to hear them before putting in the work. :-) If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE constraints, then surely we also allow multiple+anywhere PERIOD in FOREIGN KEY constraints too. (I guess the standard switched keywords because a FK is more like "MUST OVERLAPS". :-) Also if you have multiple application-time dimensions we probably need to allow multiple FOR PORTION OF clauses. I think the syntax would be: UPDATE t FOR PORTION OF valid_at FROM ... TO ... FOR PORTION OF asserted_at FROM ... TO ... [...] SET foo = bar Does that sound okay? I don't quite understand this part: >> Also, your implementation >> requires at least one non-overlaps column, which also seems like a >> confusing restriction. That's just a regular non-temporal constraint. Right? If I'm missing something let me know. [1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL. 2nd edition, 2014. [2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014. -- Paul ~{:-) p...@illuminatedcomputing.com
Re: Replace known_assigned_xids_lck by memory barrier
On Fri, Sep 1, 2023 at 3:41 PM Nathan Bossart wrote: > On Wed, Aug 16, 2023 at 01:07:15PM -0700, Nathan Bossart wrote: > > Ah, that explains it. v5 of the patch is attached. > > Barring additional feedback, I plan to commit this patch in the current > commitfest. I'm not an expert on this code but I looked at this patch briefly and it seems OK to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add 'worker_type' to pg_stat_subscription
On Wed, Aug 16, 2023 at 07:14:18PM +1000, Peter Smith wrote: > Earlier this year I proposed a small change for the pg_stat_subscription view: > > -- > ...it would be very useful to have an additional "kind" attribute for > this view. This will save the user from needing to do mental > gymnastics every time just to recognise what kind of process they are > looking at. > -- > > At that time Amit replied [1] that this could be posted as a separate > enhancement thread. > > Now that the LogicalRepWorkerType has been recently pushed [2] > (something with changes in the same area of the code) it seemed the > right time to resurrect my pg_stat_subscription proposal. This sounds generally reasonable to me. + worker_type text + + + Type of the subscription worker process. Possible values are: + + + + a: apply worker + + + + + p: parallel apply worker + + + + + t: tablesync worker + + + + + Is there any reason not to spell out the names? I think that would match the other system views better (e.g., backend_type in pg_stat_activity). Also, instead of "tablesync worker", I'd suggest using "synchronization worker" to match the name used elsewhere in this table. I see that the table refers to "leader apply workers". Would those show up as parallel apply workers in the view? Can we add another worker type for those? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Why doesn't Vacuum FULL update the VM
Hi, I noticed that VACUUM FULL actually does freeze the tuples in the rewritten table (heap_freeze_tuple()) but then it doesn't mark them all visible or all frozen in the visibility map. I don't understand why. It seems like it would save us future work. Here is an example: create extension pg_visibility; drop table if exists foo; create table foo(a int) with (autovacuum_enabled=false); insert into foo select i%3 from generate_series(1,300)i; update foo set a = 5 where a = 2; select * from pg_visibility_map_summary('foo'); vacuum (verbose) foo; select * from pg_visibility_map_summary('foo'); vacuum (full, verbose) foo; select * from pg_visibility_map_summary('foo'); I don't see why the visibility map shouldn't be updated so that all of the pages show all visible and all frozen for this relation after the vacuum full. - Melanie
Re: Eager page freeze criteria clarification
On Tue, Aug 29, 2023 at 10:22 AM Robert Haas wrote: > Let's assume for a moment that the rate at which the insert LSN is > advancing is roughly constant over time, so that it serves as a good > proxy for wall-clock time. Consider four tables A, B, C, and D that > are, respectively, vacuumed once per minute, once per hour, once per > day, and once per week. With a 33% threshold, pages in table A will be > frozen if they haven't been modified in 20 seconds, page in table B > will be frozen if they haven't been modified in 20 minutes, pages in > table C will be frozen if they haven't been modified in 8 hours, and > pages in table D will be frozen if they haven't been modified in 2 > days, 8 hours. My intuition is that this feels awfully aggressive for > A and awfully passive for D. > > [ discussion of freeze-on-evict ] Another way of thinking about this is: instead of replacing this heuristic with a complicated freeze-on-evict system, maybe we just need to adjust the heuristic. Maybe using an LSN is a good idea, but maybe the LSN of the last table vacuum isn't the right one to be using, or maybe it shouldn't be the only one that we use. For instance, what about using the redo LSN of the last checkpoint, or the checkpoint before the last checkpoint, or something like that? Somebody might find that unprincipled, but it seems to me that the checkpoint cycle has a lot to do with whether or not opportunistic freezing makes sense. If a page is likely to be modified again before a checkpoint forces it to be written, then freezing it is likely a waste. If it's likely to be written out of shared_buffers before it's modified again, then freezing it now is a pretty good bet. A given page could be evicted from shared_buffers, and thus written, sooner than the next checkpoint, but if it's dirty now, it definitely won't be written any later than the next checkpoint. By looking at the LSN of a page that I'm about to modify just before I modify it, I can make some kind of a guess as to whether this is a page that is being modified more or less than once per checkpoint cycle and adjust freezing behavior accordingly. One could also use a hybrid of the two values e.g. normally use the insert LSN of the last VACUUM, but if that's newer than the redo LSN of the last checkpoint, then use the latter instead, to avoid doing too much freezing of pages that may have been quiescent for only a few tens of seconds. I don't know if that's better or worse. As I think about it, I realize that I don't really know why Andres suggested a last-vacuum-LSN-based heuristic in the first place. Before, I wrote of this that "One thing I really like about it is that if the table is being vacuumed frequently, then we freeze less aggressively, and if the table is being vacuumed infrequently, then we freeze more aggressively." But actually, I don't think it does that. If the table is being vacuumed frequently, then the last-vacuum-LSN will be newer, which means we'll freeze *more* aggressively. And I'm not sure why we want to do that. If the table is being vacuumed a lot, it's probably also being modified a lot, which suggests that we ought to be more cautious about freezing, rather than the reverse. Just spitballing here. -- Robert Haas EDB: http://www.enterprisedb.com
Speaker Bureau
Greetings, If you are on the speaker list can you send an email to ugc...@postgresql.us indicating whether you are available to travel for meetups? This serves the obvious purpose but also provides your email address to us. Thanks, Dave Cramer
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Hello Tomas, 01.09.2023 16:00, Tomas Vondra wrote: Hmmm, I'm not very good at reading the binary code, but here's what objdump produced for WaitEventSetWait. Maybe someone will see what the issue is. At first glance, I can't see anything suspicious in the disassembly. IIUC, waiting = true presented there as: 805c38: b902ad18 str w24, [x8, #684] // pgstat_report_wait_start(): proc->wait_event_info = wait_event_info; // end of pgstat_report_wait_start(wait_event_info); 805c3c: b0ffdb09 adrp x9, 0x366000 805c40: b0ffdb0a adrp x10, 0x366000 805c44: feeb adrp x11, 0x9e4000 805c48: 52800028 mov w8, #1 // true 805c4c: 52800319 mov w25, #24 805c50: 5280073a mov w26, #57 805c54: fd446128 ldr d8, [x9, #2240] 805c58: 9d7b adrp x27, 0x9b1000 805c5c: fd415949 ldr d9, [x10, #688] 805c60: f9071d68 str x8, [x11, #3640] // waiting = true (x8 = w8) So there are two simple mov's and two load operations performed in parallel, but I don't think it's similar to what we had in that case. I thought about maybe just adding the barrier in the code, but then how would we know it's the issue and this fixed it? It happens so rarely we can't make any conclusions from a couple runs of tests. Probably I could construct a reproducer for the lockup if I had access to the such machine for a day or two. Best regards, Alexander
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Thu, 2023-08-31 at 17:17 -0400, Joe Conway wrote: > Maybe move postgres_fdw to be a first class built in feature instead > of > an extension? That could make sense, but we still have to solve the problem of how to present a built-in FDW. FDWs don't have a schema, so it can't be inside pg_catalog. So we'd need some special logic somewhere to make pg_dump and psql \dew work as expected, and I'm not quite sure what to do there. Regards, Jeff Davis
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Fri, 2023-09-01 at 12:28 +0530, Ashutosh Bapat wrote: > Thinking larger, how about we allow any FDW to be used here. That's a possibility, but I think that means the subscription would need to constantly re-check the parameters rather than relying on the FDW's validator. Otherwise it might be the wrong kind of FDW, and the user might be able to circumvent the password_required protection. It might not even be a postgres-related FDW at all, which would be a bit strange. If it's constantly re-checking the parameters then it raises the possibility that some "ALTER SERVER" or "ALTER USER MAPPING" succeeds but then subscriptions to that foreign server start failing, which would not be ideal. But I could be fine with that. > But I think there's some value in bringing > together these two subsystems which deal with foreign data logically > (as in logical vs physical view of data). I still don't understand how a core dependency on an extension would work. Regards, Jeff Davis
Re: Replace known_assigned_xids_lck by memory barrier
On Wed, Aug 16, 2023 at 01:07:15PM -0700, Nathan Bossart wrote: > Ah, that explains it. v5 of the patch is attached. Barring additional feedback, I plan to commit this patch in the current commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Fri, Sep 01, 2023 at 01:19:13PM -0500, Justin Pryzby wrote: > What about (per git grep no-sync doc) pg_receivewal? I don't think it's applicable there, either. IIUC that option specifies whether to sync the data as it is streamed over. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
On Fri, Sep 01, 2023 at 11:08:51AM -0700, Nathan Bossart wrote: > > This should probably give a distinct error when syncfs is not supported > > than when it's truely recognized. > > Later versions of the patch should have this. Oops, right. > > The patch should handle pg_dumpall, too. > > It looks like pg_dumpall only ever fsyncs a single file, so I don't think > it is really needed there. What about (per git grep no-sync doc) pg_receivewal? -- Justin
Re: should frontend tools use syncfs() ?
Thanks for taking a look. On Fri, Sep 01, 2023 at 12:58:10PM -0500, Justin Pryzby wrote: >> +if (!user_opts.sync_method) >> +user_opts.sync_method = pg_strdup("fsync"); > > why pstrdup? I believe I was just following the precedent set by some of the other options. >> +parse_sync_method(const char *optarg, SyncMethod *sync_method) >> +{ >> +if (strcmp(optarg, "fsync") == 0) >> +*sync_method = SYNC_METHOD_FSYNC; >> +#ifdef HAVE_SYNCFS >> +else if (strcmp(optarg, "syncfs") == 0) >> +*sync_method = SYNC_METHOD_SYNCFS; >> +#endif >> +else >> +{ >> +pg_log_error("unrecognized sync method: %s", optarg); >> +return false; >> +} > > This should probably give a distinct error when syncfs is not supported > than when it's truely recognized. Later versions of the patch should have this. > The patch should handle pg_dumpall, too. It looks like pg_dumpall only ever fsyncs a single file, so I don't think it is really needed there. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: should frontend tools use syncfs() ?
> + if (!user_opts.sync_method) > + user_opts.sync_method = pg_strdup("fsync"); why pstrdup? > +parse_sync_method(const char *optarg, SyncMethod *sync_method) > +{ > + if (strcmp(optarg, "fsync") == 0) > + *sync_method = SYNC_METHOD_FSYNC; > +#ifdef HAVE_SYNCFS > + else if (strcmp(optarg, "syncfs") == 0) > + *sync_method = SYNC_METHOD_SYNCFS; > +#endif > + else > + { > + pg_log_error("unrecognized sync method: %s", optarg); > + return false; > + } This should probably give a distinct error when syncfs is not supported than when it's truely recognized. The patch should handle pg_dumpall, too. Note that /bin/sync doesn't try to de-duplicate, it does just what you tell it. $ strace -e openat,syncfs,fsync sync / / / -f ... openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3 syncfs(3) = 0 openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3 syncfs(3) = 0 openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3 syncfs(3) = 0 +++ exited with 0 +++
Re: cataloguing NOT NULL constraints
On 2023-Aug-31, Alvaro Herrera wrote: > Hmm, that's some weird code I left there all right. Can you please try > this patch? (Not final; I'll review it more completely later, > particularly to add this test case.) The change in MergeAttributesIntoExisting turned out to be close but not quite there, so I pushed another version of the fix. In case you're wondering, the change in MergeConstraintsIntoExisting is a related but different case, for which I added the other test case you see there. I also noticed, while looking at this, that there's another problem when a child has a NO INHERIT not-null constraint and the parent has a primary key in the same column. It should refuse, or take over by marking it no longer NO INHERIT. But it just accepts silently and all appears to be good. The problems appear when you add a child to that child. I'll look into this later; it's not exactly the same code. At least it's not a crasher. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Adding a pg_get_owned_sequence function?
On Fri, Sep 01, 2023 at 01:31:43PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> I wonder if it'd be possible to just remove pg_get_serial_sequence(). > > A quick search at http://codesearch.debian.net/ finds uses of it > in packages like gdal, qgis, rails, ... We could maybe get rid of > it after a suitable deprecation period, but I think we can't just > summarily remove it. Given that, I'd still vote for marking it deprecated, but I don't feel strongly about actually removing it anytime soon (or anytime at all, really). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: GenBKI emits useless open;close for catalogs without rows
Alvaro Herrera writes: > On 2023-Sep-01, Matthias van de Meent wrote: >> A potential addition to the patch would to stop manually closing >> relations: initdb and check-world succeed without manual 'close' >> operations because the 'open' command auto-closes the previous open >> relation (in boot_openrel). Testing also suggests that the last opened >> relation apparently doesn't need closing - check-world succeeds >> without issues (incl. with TAP enabled). That is therefore implemented >> in attached patch 2 - it removes the 'close' syntax in its entirety. > Hmm, what happens with the last relation in the bootstrap process? Is > closerel() called via some other path for that one? Taking a quick census of existing closerel() callers: there is cleanup() in bootstrap.c, but it's called uncomfortably late and outside any transaction, so I misdoubt that it works properly if asked to actually shoulder any responsibility. (A little code reshuffling could fix that.) There are also a couple of low-level elog warnings in CREATE that would likely get triggered, though I suppose we could just remove those elogs. I guess my reaction to this patch is "why bother?". It seems unlikely to yield any measurable benefit, though of course that guess could be wrong. regards, tom lane
Re: Inefficiency in parallel pg_restore with many tables
On Fri, Sep 01, 2023 at 01:41:41PM -0400, Tom Lane wrote: > I've not actually looked at any of these patchsets after the first one. > I have added myself as a reviewer and will hopefully get to it within > a week or so. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: GenBKI emits useless open;close for catalogs without rows
On Fri, 1 Sept 2023 at 19:43, Alvaro Herrera wrote: > > On 2023-Sep-01, Matthias van de Meent wrote: > > > A potential addition to the patch would to stop manually closing > > relations: initdb and check-world succeed without manual 'close' > > operations because the 'open' command auto-closes the previous open > > relation (in boot_openrel). Testing also suggests that the last opened > > relation apparently doesn't need closing - check-world succeeds > > without issues (incl. with TAP enabled). That is therefore implemented > > in attached patch 2 - it removes the 'close' syntax in its entirety. > > Hmm, what happens with the last relation in the bootstrap process? Is > closerel() called via some other path for that one? There is a final cleanup() call that closes the last open boot_reldesc relation (if any) at the end of BootstrapModeMain, after boot_yyparse has completed and its changes have been committed. - Matthias
Re: GenBKI emits useless open;close for catalogs without rows
On 2023-Sep-01, Matthias van de Meent wrote: > A potential addition to the patch would to stop manually closing > relations: initdb and check-world succeed without manual 'close' > operations because the 'open' command auto-closes the previous open > relation (in boot_openrel). Testing also suggests that the last opened > relation apparently doesn't need closing - check-world succeeds > without issues (incl. with TAP enabled). That is therefore implemented > in attached patch 2 - it removes the 'close' syntax in its entirety. Hmm, what happens with the last relation in the bootstrap process? Is closerel() called via some other path for that one? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane)
Re: Inefficiency in parallel pg_restore with many tables
Nathan Bossart writes: > I'm hoping to commit these patches at some point in the current commitfest. > I don't sense anything tremendously controversial, and they provide a > pretty nice speedup in some cases. Are there any remaining concerns? I've not actually looked at any of these patchsets after the first one. I have added myself as a reviewer and will hopefully get to it within a week or so. regards, tom lane
Re: Adding a pg_get_owned_sequence function?
Nathan Bossart writes: > I wonder if it'd be possible to just remove pg_get_serial_sequence(). A quick search at http://codesearch.debian.net/ finds uses of it in packages like gdal, qgis, rails, ... We could maybe get rid of it after a suitable deprecation period, but I think we can't just summarily remove it. regards, tom lane
GenBKI emits useless open;close for catalogs without rows
Hi, Whilst looking at PostgreSQL's bootstrapping process, I noticed that postgres.bki contains quite a few occurrances of the pattern "open $catname; close $catname". I suppose this pattern isn't too expensive, but according to my limited research a combined open+close cycle doens't do anything meaningful, so it does waste some CPU cycles in the process. The attached patch 1 removes the occurances of those combined open/close statements in postgresql.bki. Locally it passes check-world, so I assume that opening and closing a table is indeed not required for initializing a data-less catalog during bootstrapping. A potential addition to the patch would to stop manually closing relations: initdb and check-world succeed without manual 'close' operations because the 'open' command auto-closes the previous open relation (in boot_openrel). Testing also suggests that the last opened relation apparently doesn't need closing - check-world succeeds without issues (incl. with TAP enabled). That is therefore implemented in attached patch 2 - it removes the 'close' syntax in its entirety. Kind regards, Matthias van de Meent Neon (https://neon.tech) 0002-Remove-the-bki-close-command.patch Description: Binary data 0001-Stop-emitting-open-nodata-close-patterns-in-genbki.p.patch Description: Binary data
Re: Inefficiency in parallel pg_restore with many tables
On Tue, Jul 25, 2023 at 11:53:36AM -0700, Nathan Bossart wrote: > Here is a polished patch set for this approach. I've also added a 0004 > that replaces the open-coded heap in pg_dump_sort.c with a binaryheap. > IMHO these patches are in decent shape. I'm hoping to commit these patches at some point in the current commitfest. I don't sense anything tremendously controversial, and they provide a pretty nice speedup in some cases. Are there any remaining concerns? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding a pg_get_owned_sequence function?
On Fri, Jun 09, 2023 at 08:19:44PM +0100, Dagfinn Ilmari Mannsåker wrote: > I've always been annoyed by the fact that pg_get_serial_sequence takes > the table and returns the sequence as strings rather than regclass. And > since identity columns were added, the name is misleading as well (which > is even acknowledged in the docs, together with a suggestion for a > better name). > > So, instead of making excuses in the documentation, I thought why not > add a new function which addresses all of these issues, and document the > old one as a backward-compatibilty wrapper? This sounds generally reasonable to me. That note has been there since 2006 (2b2a507). I didn't find any further discussion about this on the lists. > +A backwards-compatibility wrapper > +for pg_get_owned_sequence, which > +uses text for the table and sequence names instead of > +regclass. The first parameter is a table name with > optional I wonder if it'd be possible to just remove pg_get_serial_sequence(). Assuming 2b2a507 removed the last use of it in pg_dump, any dump files created on versions >= v8.2 shouldn't use it. But I suppose it wouldn't be too much trouble to keep it around for anyone who happens to need it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Is there a complete doc to describe pg's traction implementation in detail?
Casual Meson fixups
The first two patches in the series are re-proposals that had previously been approved[0] by Andres, but fell through the cracks. The only patch that _could_ be controversial is probably the last one, but from my understanding it would match up with the autotools build. One thing that I did notice while testing this patch is that Muon doesn't build postgres without coercing the build a bit. I had to disable nls and plpython. The nls issue could be fixed with a bump to Meson 0.59, which introduces import(required:). nls isn't supported in Muon unfortunately at the moment. The plpython issue is that it doesn't understand pymod.find_installation(required:), which is a bug in Muon. Muon development has slowed quite a bit this year. Postgres is probably the largest project which tries its best to support Muon. It seems like if we want to keep supporting Muon, we should get a buildfarm machine to use it instead of Meson to catch regressions. OR we should contemplate removing support for it. Alternatively someone (me?) could step up and provide some patches to Muon to make the postgres experience better. But I wonder if any Postgres user even uses Muon to build it. -- Tristan Partin Neon (https://neon.tech) From 462ea170b1410d7916a25ffa6d27dc45284db7e3 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Wed, 17 May 2023 10:36:52 -0500 Subject: [PATCH v1 2/7] Add Meson override for libpq Meson has the ability to do transparent overrides when projects are used as subprojects. For instance, say I am building a Postgres extension. I can define Postgres to be a subproject of my extension given the following wrap file: [wrap-git] url = https://git.postgresql.org/git/postgresql.git revision = master depth = 1 [provide] dependency_names = libpq Then in my extension (root project), I can have the following line snippet: libpq = dependency('libpq') This will tell Meson to transparently compile libpq prior to it compiling my extension (because I depend on libpq) if libpq isn't found on the host system. --- src/interfaces/libpq/meson.build | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/interfaces/libpq/meson.build b/src/interfaces/libpq/meson.build index 80e6a15adf..6d18970e81 100644 --- a/src/interfaces/libpq/meson.build +++ b/src/interfaces/libpq/meson.build @@ -84,6 +84,8 @@ libpq = declare_dependency( include_directories: [include_directories('.')] ) +meson.override_dependency('libpq', libpq) + pkgconfig.generate( name: 'libpq', description: 'PostgreSQL libpq library', -- Tristan Partin Neon (https://neon.tech) From 5455426c9944ff8c8694db46929eaa37e03d907f Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Fri, 1 Sep 2023 11:07:40 -0500 Subject: [PATCH v1 7/7] Disable building contrib targets by default This matches the autotools build. --- contrib/meson.build | 4 +++- contrib/oid2name/meson.build | 2 +- contrib/vacuumlo/meson.build | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/contrib/meson.build b/contrib/meson.build index 618b1c50c8..06c579def5 100644 --- a/contrib/meson.build +++ b/contrib/meson.build @@ -1,6 +1,8 @@ # Copyright (c) 2022-2023, PostgreSQL Global Development Group -contrib_mod_args = pg_mod_args +# These will get built during test runs or if the contrib target is ran. +contrib_bin_args = default_bin_args + { 'build_by_default': false } +contrib_mod_args = pg_mod_args + { 'build_by_default': false } contrib_data_dir = dir_data_extension contrib_data_args = { diff --git a/contrib/oid2name/meson.build b/contrib/oid2name/meson.build index 171d2d226b..904c94a0e1 100644 --- a/contrib/oid2name/meson.build +++ b/contrib/oid2name/meson.build @@ -13,7 +13,7 @@ endif oid2name = executable('oid2name', oid2name_sources, dependencies: [frontend_code, libpq], - kwargs: default_bin_args, + kwargs: contrib_bin_args, ) contrib_targets += oid2name diff --git a/contrib/vacuumlo/meson.build b/contrib/vacuumlo/meson.build index 9fa7380590..a059dda65a 100644 --- a/contrib/vacuumlo/meson.build +++ b/contrib/vacuumlo/meson.build @@ -13,7 +13,7 @@ endif vacuumlo = executable('vacuumlo', vacuumlo_sources, dependencies: [frontend_code, libpq], - kwargs: default_bin_args, + kwargs: contrib_bin_args, ) contrib_targets += vacuumlo -- Tristan Partin Neon (https://neon.tech) From 6c7b3bde703014ccec91f3f6a8a7af21b36d0a1f Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Wed, 17 May 2023 09:40:02 -0500 Subject: [PATCH v1 1/7] Make finding pkg-config(python3) more robust It is a possibility that the installation can't be found. --- meson.build | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/meson.build b/meson.build index 5422885b0a..bd9cdb797e 100644 --- a/meson.build +++ b/meson.build @@ -1056,15 +1056,17 @@ endif ### pyopt = get_option('plpython') +python3_dep = not_found_dep if not pyopt.disabled()
Re: PATCH: Add REINDEX tag to event triggers
On Fri, Sep 1, 2023 at 6:40 PM Jim Jones wrote: > > On 01.09.23 11:23, jian he wrote: > > because the change made in here: > > https://git.postgresql.org/cgit/postgresql.git/diff/src/backend/commands/indexcmds.c?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7 > > > > I manually slight edited the patch content: > > from > > static List *ChooseIndexColumnNames(List *indexElems); > > static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params, > >bool isTopLevel); > > to > > static List *ChooseIndexColumnNames(const List *indexElems); > > static void ReindexIndex(const RangeVar *indexRelation, const > > ReindexParams *params, > > bool isTopLevel); > > > > can you try the attached one. > > The patch applies cleanly. However, now the compiler complains about a > qualifier mismatch > > indexcmds.c:2707:33: warning: assignment discards ‘const’ qualifier from > pointer target type [-Wdiscarded-qualifiers] > 2707 | c > > > Perhaps 'const ReindexStmt *currentReindexStatement'? > > Tests also work just fine. I also tested it against unique and partial > indexes, and it worked as expected. > > Jim > Thanks for pointing this out! also due to commit https://git.postgresql.org/cgit/postgresql.git/commit/?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7 ExecReindex function input arguments also changed. so we need to rebase this patch. change to currentReindexStatement = unconstify(ReindexStmt*, stmt); seems to work for me. I tested it, no warning. But I am not 100% sure. anyway I refactored the patch, making it git applyable also change from "currentReindexStatement = stmt;" to "currentReindexStatement = unconstify(ReindexStmt*, stmt);" due to ExecReindex function changes. From a3b6775362e07d53b367bcf303a3535d24f42cce Mon Sep 17 00:00:00 2001 From: pgaddict Date: Fri, 1 Sep 2023 23:51:46 +0800 Subject: [PATCH v3] Add REINDEX tag to event triggers This turns on the event tag REINDEX. This will now return a record for each index that was modified during as part of start/end ddl commands. For concurrent reindex, this will return the OID for the new index. This includes concurrently reindexing tables and databases. It will only report the new index records and not the ones destroyed by the concurrent reindex process. Author: Garrett Thornburg --- doc/src/sgml/event-trigger.sgml | 8 ++ src/backend/commands/indexcmds.c| 83 + src/backend/tcop/utility.c | 12 +- src/include/tcop/cmdtaglist.h | 2 +- src/test/regress/expected/event_trigger.out | 126 src/test/regress/sql/event_trigger.sql | 99 +++ 6 files changed, 327 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml index 3b6a5361..aa1b359f 100644 --- a/doc/src/sgml/event-trigger.sgml +++ b/doc/src/sgml/event-trigger.sgml @@ -1013,6 +1013,14 @@ - + +REINDEX +X +X +- +- + + REVOKE X diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index ab8b81b3..483b0665 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -95,6 +95,8 @@ static char *ChooseIndexNameAddition(const List *colnames); static List *ChooseIndexColumnNames(const List *indexElems); static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool isTopLevel); +static void reindex_event_trigger_collect(Oid oid); +static void reindex_event_trigger_collect_relation(Oid oid); static void RangeVarCallbackForReindexIndex(const RangeVar *relation, Oid relId, Oid oldRelId, void *arg); static Oid ReindexTable(const RangeVar *relation, const ReindexParams *params, @@ -111,6 +113,8 @@ static bool ReindexRelationConcurrently(Oid relationOid, static void update_relispartition(Oid relationId, bool newval); static inline void set_indexsafe_procflags(void); +static ReindexStmt *currentReindexStatement = NULL; + /* * callback argument type for RangeVarCallbackForReindexIndex() */ @@ -2696,6 +2700,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel) bool verbose = false; char *tablespacename = NULL; + /* + * Make current stmt available for event triggers without directly passing + * the context to every subsequent call. + */ + currentReindexStatement = unconstify(ReindexStmt*, stmt); + /* Parse option list */ foreach(lc, stmt->params) { @@ -2776,6 +2786,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel) (int) stmt->kind); break; } + + /* + * Clear the reindex stmt global reference now that triggers should have + * completed + */ + currentReindexStatement = NULL; } /* @@ -2827,6 +2843,8 @@ ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool is newparams.options |=
Add const qualifiers
Hackers, I noticed that there was a mismatch between the const qualifiers for excludeDirContents in src/backend/backup/backup/basebackup.c and src/bin/pg_rewind/file_map.c and that led me to use ^static const.*\*.*= to do a quick search for similar cases. I think at the least we should make excludeDirContents match, but the rest of the changes seem like a good idea as well. Regards, -David diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c index 5686497983..fc9cfbeda4 100644 --- a/contrib/fuzzystrmatch/fuzzystrmatch.c +++ b/contrib/fuzzystrmatch/fuzzystrmatch.c @@ -55,7 +55,7 @@ static void _soundex(const char *instr, char *outstr); #define SOUNDEX_LEN 4 /* ABCDEFGHIJKLMNOPQRSTUVWXYZ */ -static const char *soundex_table = "01230120022455012623010202"; +static const char *const soundex_table = "01230120022455012623010202"; static char soundex_code(char letter) diff --git a/contrib/pgcrypto/pgp-armor.c b/contrib/pgcrypto/pgp-armor.c index 9128756647..bfc90af063 100644 --- a/contrib/pgcrypto/pgp-armor.c +++ b/contrib/pgcrypto/pgp-armor.c @@ -178,8 +178,8 @@ pg_base64_dec_len(unsigned srclen) * PGP armor */ -static const char *armor_header = "-BEGIN PGP MESSAGE-\n"; -static const char *armor_footer = "\n-END PGP MESSAGE-\n"; +static const char *const armor_header = "-BEGIN PGP MESSAGE-\n"; +static const char *const armor_footer = "\n-END PGP MESSAGE-\n"; /* CRC24 implementation from rfc2440 */ #define CRC24_INIT 0x00b704ceL diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index b42711f574..d03c961678 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -228,7 +228,7 @@ static const FormData_pg_attribute a6 = { .attislocal = true, }; -static const FormData_pg_attribute *SysAtt[] = {, , , , , }; +static const FormData_pg_attribute *const SysAtt[] = {, , , , , }; /* * This function returns a Form_pg_attribute pointer for a system attribute. diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 97b0ef22ac..4fae852666 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -316,9 +316,9 @@ typedef void (*rsv_callback) (Node *node, deparse_context *context, * -- */ static SPIPlanPtr plan_getrulebyoid = NULL; -static const char *query_getrulebyoid = "SELECT * FROM pg_catalog.pg_rewrite WHERE oid = $1"; +static const char *const query_getrulebyoid = "SELECT * FROM pg_catalog.pg_rewrite WHERE oid = $1"; static SPIPlanPtr plan_getviewrule = NULL; -static const char *query_getviewrule = "SELECT * FROM pg_catalog.pg_rewrite WHERE ev_class = $1 AND rulename = $2"; +static const char *const query_getviewrule = "SELECT * FROM pg_catalog.pg_rewrite WHERE ev_class = $1 AND rulename = $2"; /* GUC parameters */ bool quote_all_identifiers = false; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 84e7ad4d90..c25c697a06 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -112,7 +112,7 @@ typedef struct #error XLOG_BLCKSZ must be between 1KB and 1MB #endif -static const char *memory_units_hint = gettext_noop("Valid units for this parameter are \"B\", \"kB\", \"MB\", \"GB\", and \"TB\"."); +static const char *const memory_units_hint = gettext_noop("Valid units for this parameter are \"B\", \"kB\", \"MB\", \"GB\", and \"TB\"."); static const unit_conversion memory_unit_conversion_table[] = { @@ -149,7 +149,7 @@ static const unit_conversion memory_unit_conversion_table[] = {""}/* end of table marker */ }; -static const char *time_units_hint = gettext_noop("Valid units for this parameter are \"us\", \"ms\", \"s\", \"min\", \"h\", and \"d\"."); +static const char *const time_units_hint = gettext_noop("Valid units for this parameter are \"us\", \"ms\", \"s\", \"min\", \"h\", and \"d\"."); static const unit_conversion time_unit_conversion_table[] = { diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 905b979947..79548ba06e 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -217,8 +217,8 @@ static bool authwarning = false; * but here it is more convenient to pass it as an environment variable * (no quoting to worry about). */ -static const char *boot_options = "-F -c log_checkpoints=false"; -static const char *backend_options = "--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false"; +static const char *const boot_options = "-F -c log_checkpoints=false"; +static const char *const backend_options = "--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false"; /* Additional switches to pass to backend (either boot or standalone) */ static char *extra_options = ""; diff
Re: Initdb-time block size specification
On Thu, Aug 31, 2023 at 2:32 PM David Christensen wrote: > Here's a patch atop the series which converts to 16-bit uints and > passes regressions, but I don't consider well-vetted at this point. For what it's worth, my gut reaction to this patch series is similar to that of Andres: I think it will be a disaster. If the disaster is not evident to us, that's far more likely to mean that we've failed to test the right things than it is to mean that there is no disaster. I don't see that there is a lot of upside, either. I don't think we have a lot of evidence that changing the block size is really going to help performance. In fact, my guess is that there are large amounts of code that are heavily optimized, without the authors even realizing it, for 8kB blocks, because that's what we've always had. If we had much larger or smaller blocks, the structure of heap pages or of the various index AMs used for blocks might no longer be optimal, or might be less optimal than they are for an 8kB block size. If you use really large blocks, your blocks may need more internal structure than we have today in order to avoid CPU inefficiencies. I suspect there's been so little testing of non-default block sizes that I wouldn't even count on the code to not be outright buggy. If we could find a safe way to get rid of full page writes, I would certainly agree that that was worth considering. I'm not sure that anything in this thread adds up to that being a reasonable way to go, but the savings would be massive. I feel like the proposal here is a bit like deciding to change the speed limit on all American highways from 65 mph or whatever it is to 130 mph or 32.5 mph and see which way works out best. The whole infrastructure has basically been designed around the current rules. The rate of curvature of the roads is appropriate for the speed that you're currently allowed to drive on them. The vehicles are optimized for long-term operation at about that speed. The people who drive the vehicles are accustomed to driving at that speed, and the people who maintain them are accustomed to the problems that happen when you drive them at that speed. Just changing the speed limit doesn't change all that other stuff, and changing all that other stuff is a truly massive undertaking. Maybe this example somewhat overstates the difficulties here, but I do think the difficulties are considerable. The fact that we have 8kB block sizes has affected the thinking of hundreds of developers over decades in making thousands or tens of thousands or hundreds of thousands of decisions about algorithm selection and page format and all kinds of stuff. Even if some other page size seems to work better in a certain context, it's pretty hard to believe that it has much chance of being better overall, even without the added overhead of run-time configuration. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Buildfarm failures on urocryon
Hi, On Fri, Sep 01, 2023 at 11:27:47AM +0530, vignesh C wrote: > Hi, > > Recently urocryon has been failing with the following errors at [1]: > checking for icu-uc icu-i18n... no > configure: error: ICU library not found > If you have ICU already installed, see config.log for details on the > failure. It is possible the compiler isn't looking in the proper directory. > Use --without-icu to disable ICU support. > > configure:8341: checking whether to build with ICU support > configure:8371: result: yes > configure:8378: checking for icu-uc icu-i18n > configure:8440: result: no > configure:8442: error: ICU library not found > If you have ICU already installed, see config.log for details on the > failure. It is possible the compiler isn't looking in the proper directory. > Use --without-icu to disable ICU support. > > Urocryon has been failing for the last 17 days. > > I think ICU libraries need to be installed in urocryon to fix this issue. Oops, that's when I upgraded the build farm client (from v14 to v17). I think it's fixed now... Regards, Mark
Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)
Hi, If a null locale is reached in these paths. elog will dereference a null pointer. best regards, Ranier Vilela 0001-Avoid-a-possible-dereference-a-null-pointer.patch Description: Binary data
Re: trying again to get incremental backup
Hey, thanks for the reply. On Thu, Aug 31, 2023 at 6:50 PM David Steele wrote: > pg_subtrans, at least, can be ignored since it is excluded from the > backup and not required for recovery. I agree... > Welcome to the club! Thanks for the welcome, but being a member feels *terrible*. :-) > I do not. My conclusion back then was that validating a physical > comparison would be nearly impossible without changes to Postgres to > make the primary and standby match via replication. Which, by the way, I > still think would be a great idea. In principle, at least. Replay is > already a major bottleneck and anything that makes it slower will likely > not be very popular. Fair point. But maybe the bigger issue is the work involved. I don't think zeroing the hole in all cases would likely be that expensive, but finding everything that can cause the standby to diverge from the primary and fixing all of it sounds like an unpleasant amount of effort. Still, it's good to know that I'm not missing something obvious. > No objections to 0001/0002. Cool. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Should we use MemSet or {0} for struct initialization?
On Fri, 1 Sept 2023 at 15:25, John Naylor wrote: > On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema wrote: > > The C standard says: > > > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. > > > > So if you set any of the fields after a MemSet, the values of the > > padding bytes that were set to 0 are now unspecified. It seems much > > safer to actually spell out the padding fields of a hash key. > > No, the standard is telling you why you need to memset if consistency of padding bytes matters. Maybe I'm misunderstanding the sentence from the C standard I quoted. But under my interpretation it means that even an assignment to a field of a struct causes the padding bytes to take unspecified (but not undefined) values, because of the "including in a member object" part of the sentence. It's ofcourse possible that all compilers relevant to Postgres never actually change padding when assigning to a field.
Re: Should we use MemSet or {0} for struct initialization?
On 2023-09-01 09:25, John Naylor wrote: On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema wrote: The C standard says: > When a value is stored in an object of structure or union type, > including in a member object, the bytes of the object representation that > correspond to any padding bytes take unspecified values. So if you set any of the fields after a MemSet, the values of the padding bytes that were set to 0 are now unspecified. It seems much safer to actually spell out the padding fields of a hash key. No, the standard is telling you why you need to memset if consistency of padding bytes matters. Um, I'm in no way a language lawyer for recent C specs, but the language Jelte Fennema quoted is also present in the rather old 9899 TC2 draft I still have around from years back, and in particular it does say that upon assignment, padding bytes ▶take◀ unspecified values, not merely that they retain whatever unspecified values they may have had before. There is a footnote attached (in 9899 TC2) that says "Thus, for example, structure assignment need not copy any padding bits." If that footnote example were normative, it would be reassuring, because you could assume that padding bits not copied are unchanged and remember what you originally memset() them to. So that would be nice. But everything about the form and phrasing of the footnote conveys that it isn't normative. And the normative text does appear to be saying that those padding bytes ▶take◀ unspecified values upon, assignment to the object, even if you may have memset() them before. Or at least to be saying that's what could happen, in some implementation on some architecture, and it would be standard-conformant if it did. Perhaps there is language elsewhere in the standard that pins it down to the way you've interpreted it? If you know where that language is, could you point to it? Regards, -Chap
Re: Should we use MemSet or {0} for struct initialization?
On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema wrote: > The C standard says: > > When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. > > So if you set any of the fields after a MemSet, the values of the > padding bytes that were set to 0 are now unspecified. It seems much > safer to actually spell out the padding fields of a hash key. No, the standard is telling you why you need to memset if consistency of padding bytes matters. -- John Naylor EDB: http://www.enterprisedb.com
Re: Fix a typo in decode.c
On Fri, Sep 1, 2023 at 5:09 PM Zhijie Hou (Fujitsu) wrote: > > When reading the code, I noticed a typo in the description of WAL record. > > /* > - * Decode XLOG_HEAP2_MULTI_INSERT_insert record into multiple tuplebufs. > + * Decode XLOG_HEAP2_MULTI_INSERT record into multiple tuplebufs. > * > > And attach a small patch to fix it. > LGTM. -- With Regards, Amit Kapila.
Re: sandboxing untrusted code
On Thu, Aug 31, 2023 at 8:57 PM Jeff Davis wrote: > > As a refresher, the scenario I'm talking about is any one in which > > one > > user, who I'll call Bob, does something that results in executing > > code > > provided by another user, who I'll call Alice. The most obvious way > > that this can happen is if Bob performs some operation that targets a > > table owned by Alice. That operation might be DML, like an INSERT or > > UPDATE; or it might be some other kind of maintenance command that > > can > > cause code execution, like REINDEX, which can evaluate index > > expressions. > > REINDEX executes index expressions as the table owner. (You are correct > that INSERT executes index expressions as the inserting user.) I was speaking here of who provided the code, rather than whose credentials were used to execute it. The index expressions are provided by the table owner no matter who evaluates them in a particular case. > > 1. Compute stuff. There's no restriction on the permissible amount of > > compute; if you call untrusted code, nothing prevents it from running > > forever. > > 2. Call other code. This may be done by a function call or a command > > such as CALL or DO, all subject to the usual permissions checks but > > no > > further restrictions. > > 3. Access the current session state, without modifying it. For > > example, executing SHOW or current_setting() is fine. > > 4. Transiently modify the current session state in ways that are > > necessarily reversed before returning to the caller. For example, an > > EXCEPTION block or a configuration change driven by proconfig is > > fine. > > 5. Produce messages at any log level. This includes any kind of > > ERROR. > > Nothing in that list really exercises privileges (except #2?). If those > are the allowed set of things a sandboxed function can do, is a > sandboxed function equivalent to a function running with no privileges > at all? Close but not quite. As you say, #2 does exercise privileges. Also, even if no privileges are exercised, you could still refer to CURRENT_ROLE, and I think you could also call a function like has_table_privilege. Your identity hasn't changed, but you're restricted from exercising some of your privileges. Really, you still have them, but they're just not available to you in that situation. > Please explain #2 in a bit more detail. Whose EXECUTE privileges would > be used (I assume it depende on SECURITY DEFINER/INVOKER)? Would the > called code also be sandboxed? Nothing in this proposed system has any impact on whose privileges are used in any particular context, so any privilege checks conducted pursuant to #2 are performed as the same user who would perform them today. Whether the called code would be sandboxed depends on how the rules I articulated in the previous email would apply to it. Since those rules depend on the user IDs, if the called code is owned by the same user as the calling code and is SECURITY INVOKER, then those rules apply in the same way and the same level of sandboxing will apply. But if the called function is owned by a different user or is SECURITY DEFINER, then the rules might apply differently to the called code than the calling code. It's possible this isn't quite good enough and that some adjustments to the rules are necessary; I'm not sure. > Let me qualify that: if the function is written by Alice, and the code > is able to really exercise the privileges of the caller (Bob), then it > seems really hard to make it safe for the caller. > > If the function is sandboxed such that it's not really using Bob's > privileges (it's just nominally running as Bob) that's a much more > tractable problem. Agreed. > One complaint (not an objection, because I don't think we have > the luxury of objecting to viable proposals when it comes to improving > our security model): > > Although your proposal sounds like a good security backstop, it feels > like it's missing the point that there are different _kinds_ of > functions. We already have the IMMUTABLE marker and we already have > runtime checks to make sure that immutable functions can't CREATE > TABLE; why not build on that mechanism or create new markers? I haven't ruled that out completely, but there's some subtlety here that doesn't exist in those other cases. If the owner of a function marks it wrongly in terms of volatility or parallel safety, then they might make queries run more slowly than they should, or they might make queries return wrong answers, or error out, or even end up with messed-up indexes. But none of that threatens the stability of the system in any very deep way, or the security of the system. It's no different than putting a CHECK (false) constraint on a table, or something like that: it might make the system not work, and if that happens, then you can fix it. Here, however, we can't trust the owners of functions to label those functions accurately. It won't do for Alice to create a function and then apply the
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Amit, Thank you for reviewing! New patch can be available in [1]. > + /* > + * Note: This must be done after doing the pg_resetwal command because > + * pg_resetwal would remove required WALs. > + */ > + if (count_old_cluster_logical_slots()) > + { > + start_postmaster(_cluster, true); > + create_logical_replication_slots(); > + stop_postmaster(false); > + } > > Can we combine this code with the code in the function > issue_warnings_and_set_wal_level()? That will avoid starting/stopping > the server for creating slots. Yeah, I can. But create_logical_replication_slots() must be done before doing "initdb --sync-only", so they put before that. The name is setup_new_cluster(). [1]: https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Peter, Thanks for reviewing! New patch can be available in [1]. > > == > src/bin/pg_upgrade/check.c > > 1. check_old_cluster_for_valid_slots > > + /* Quick exit if the cluster does not have logical slots. */ > + if (count_old_cluster_logical_slots() == 0) > + return; > > /Quick exit/Quick return/ > > I know they are kind of the same, but the reason I previously > suggested this change was to keep it consistent with the similar > comment that is already in > check_new_cluster_logical_replication_slots(). Fixed. > 2. check_old_cluster_for_valid_slots > > + /* > + * Do additional checks to ensure that confirmed_flush LSN of all the slots > + * is the same as the latest checkpoint location. > + * > + * Note: This can be satisfied only when the old cluster has been shut > + * down, so we skip this live checks. > + */ > + if (!live_check) > > missing word > > /skip this live checks./skip this for live checks./ Fixed. > src/bin/pg_upgrade/controldata.c > > 3. > + /* > + * Read the latest checkpoint location if the cluster is PG17 > + * or later. This is used for upgrading logical replication > + * slots. Currently, we need it only for the old cluster but > + * didn't add additional check for the similicity. > + */ > + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) > > /similicity/simplicity/ > > SUGGESTION > Currently, we need it only for the old cluster but for simplicity > chose not to have additional checks. Fixed. > src/bin/pg_upgrade/info.c > > 4. get_old_cluster_logical_slot_infos_per_db > > + /* > + * The temporary slots are expressly ignored while checking because such > + * slots cannot exist after the upgrade. During the upgrade, clusters are > + * started and stopped several times so that temporary slots will be > + * removed. > + */ > + res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase " > + "FROM pg_catalog.pg_replication_slots " > + "WHERE slot_type = 'logical' AND " > + "wal_status <> 'lost' AND " > + "database = current_database() AND " > + "temporary IS FALSE;"); > > IIUC, the removal of temp slots is just a side-effect of the > start/stop; not the *reason* for the start/stop. So, the last sentence > needs some modification > > BEFORE > During the upgrade, clusters are started and stopped several times so > that temporary slots will be removed. > > SUGGESTION > During the upgrade, clusters are started and stopped several times > causing any temporary slots to be removed. > Fixed. [1]: https://www.postgresql.org/message-id/TYAPR01MB5866F7D8ED15BA1E8E4A2AB0F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Dilip, Thank you for reviewing! > > 1. > + conn = connectToServer(_cluster, "template1"); > + > + prep_status("Checking for logical replication slots"); > + > + res = executeQueryOrDie(conn, "SELECT slot_name " > + "FROM pg_catalog.pg_replication_slots " > + "WHERE slot_type = 'logical' AND " > + "temporary IS FALSE;"); > > > I think we should add some comment saying this query will only fetch > logical slots because the database name will always be NULL in the > physical slots. Otherwise looking at the query it is very confusing > how it is avoiding the physical slots. Hmm, the query you pointed out does not check the database of the slot... We are fetching only logical slots by the condition "slot_type = 'logical'", I think it is too trivial to describe in the comment. Just to confirm - pg_replication_slots can see alls the slots even if the database is not current one. ``` tmp=# SELECT slot_name, slot_type, database FROM pg_replication_slots where database != current_database(); slot_name | slot_type | database ---+---+-- test | logical | postgres (1 row) ``` If I misunderstood something, please tell me... > 2. > +void > +get_old_cluster_logical_slot_infos(void) > +{ > + int dbnum; > + > + /* Logical slots can be migrated since PG17. */ > + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) > + return; > + > + pg_log(PG_VERBOSE, "\nsource databases:"); > > I think we need to change some headings like "slot info source > databases:" Or add an extra message saying printing slot information. > > Before this patch, we were printing all the relation information so > message ordering was quite clear e.g. > > source databases: > Database: "template1" > relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: "" > relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, > reltblspace: "" > Database: "postgres" > relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: "" > relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, > reltblspace: "" > > But after this patch slot information is also getting printed in a > similar fashion so it's very confusing now. Refer > get_db_and_rel_infos() for how it is fetching all the relation > information first and then printing them. > > > > > 3. One more problem is that the slot information and the execute query > messages are intermingled so it becomes more confusing, see the below > example of the latest messaging. I think ideally we should execute > these queries first > and then print all slot information together instead of intermingling > the messages. > > source databases: > executing: SELECT pg_catalog.set_config('search_path', '', false); > executing: SELECT slot_name, plugin, two_phase FROM > pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND > database = current_database() AND temporary IS FALSE; > Database: "template1" > executing: SELECT pg_catalog.set_config('search_path', '', false); > executing: SELECT slot_name, plugin, two_phase FROM > pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND > database = current_database() AND temporary IS FALSE; > Database: "postgres" > slotname: "isolation_slot1", plugin: "pgoutput", two_phase: 0 > > 4. Looking at the above two comments I feel that now the order should be like > - Fetch all the db infos > get_db_infos() > - loop >get_rel_infos() >get_old_cluster_logical_slot_infos() > > -- and now print relation and slot information per database > print_db_infos() Fixed like that. It seems that we go back to old style... Now the debug prints are like below: ``` source databases: Database: "template1" relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: "" relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: "" Database: "postgres" relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: "" relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: "" Logical replication slots within the database: slotname: "old1", plugin: "test_decoding", two_phase: 0 slotname: "old2", plugin: "test_decoding", two_phase: 0 slotname: "old3", plugin: "test_decoding", two_phase: 0 ``` Best Regards, Hayato Kuroda FUJITSU LIMITED v30-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch Description: v30-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch v30-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch Description: v30-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
On 9/1/23 10:00, Alexander Lakhin wrote: > Hello Thomas, > > 31.08.2023 14:15, Thomas Munro wrote: > >> We have a signal that is pending and not blocked, so I don't >> immediately know why poll() hasn't returned control. > > When I worked at the Postgres Pro company, we observed a similar lockup > under rather specific conditions (we used Elbrus CPU and the specific > Elbrus > compiler (lcc) based on edg). > I managed to reproduce that lockup and Anton Voloshin investigated it. > The issue was caused by the compiler optimization in WaitEventSetWait(): > waiting = true; > ... > while (returned_events == 0) > { > ... > if (set->latch && set->latch->is_set) > { > ... > break; > } > > In that case, compiler decided that it may place the read > "set->latch->is_set" before the write "waiting = true". > (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the > issue for us.) > I can't provide more details for now, but maybe you could look at the > binary > code generated on the target platform to confirm or reject my guess. > Hmmm, I'm not very good at reading the binary code, but here's what objdump produced for WaitEventSetWait. Maybe someone will see what the issue is. I thought about maybe just adding the barrier in the code, but then how would we know it's the issue and this fixed it? It happens so rarely we can't make any conclusions from a couple runs of tests. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company00805ba0 : 805ba0: d102c3ff sub sp, sp, #176 805ba4: 6d0423e9 stp d9, d8, [sp, #64] 805ba8: a9057bfd stp x29, x30, [sp, #80] 805bac: a9066ffc stp x28, x27, [sp, #96] 805bb0: a90767fa stp x26, x25, [sp, #112] 805bb4: a9085ff8 stp x24, x23, [sp, #128] 805bb8: a90957f6 stp x22, x21, [sp, #144] 805bbc: a90a4ff4 stp x20, x19, [sp, #160] 805bc0: 910143fd add x29, sp, #80 805bc4: 717f cmp w3, #0 805bc8: f90007e2 str x2, [sp, #8] 805bcc: 5400240d b.le0x80604c 805bd0: 2a0403f8 mov w24, w4 805bd4: 2a0303f5 mov w21, w3 805bd8: aa0103f3 mov x19, x1 805bdc: aa0003f4 mov x20, x0 805be0: b7f801e1 tbnzx1, #63, 0x805c1c 805be4: 910083e1 add x1, sp, #32 805be8: 52800080 mov w0, #4 805bec: 940623f5 bl 0x98ebc0 805bf0: d35ffe68 lsr x8, x19, #31 805bf4: aa1303f7 mov x23, x19 805bf8: b4000148 cbz x8, 0x805c20 805bfc: 90ffd480 adrpx0, 0x295000 805c00: 91175c00 add x0, x0, #1495 805c04: f0ffd5c1 adrpx1, 0x2c 805c08: 9111f021 add x1, x1, #1148 805c0c: 90ffd822 adrpx2, 0x309000 805c10: 91046842 add x2, x2, #282 805c14: 52807563 mov w3, #939 805c18: 9404de3e bl 0x93d510 805c1c: 92800017 mov x23, #-1 805c20: 9de8 adrpx8, 0x9c1000 805c24: 9f09 adrpx9, 0x9e5000 805c28: 3979e108 ldrbw8, [x8, #3704] 805c2c: 3488 cbz w8, 0x805c3c 805c30: f940f128 ldr x8, [x9, #480] 805c34: b448 cbz x8, 0x805c3c 805c38: b902ad18 str w24, [x8, #684] 805c3c: b0ffdb09 adrpx9, 0x366000 805c40: b0ffdb0a adrpx10, 0x366000 805c44: feeb adrpx11, 0x9e4000 805c48: 52800028 mov w8, #1 805c4c: 52800319 mov w25, #24 805c50: 5280073a mov w26, #57 805c54: fd446128 ldr d8, [x9, #2240] 805c58: 9d7b adrpx27, 0x9b1000 805c5c: fd415949 ldr d9, [x10, #688] 805c60: f9071d68 str x8, [x11, #3640] 805c64: f90003f3 str x19, [sp] 805c68: 1410 b 0x805ca8 805c6c: 9e620100 scvtf d0, x8 805c70: d2d09008 mov x8, #145685290680320 805c74: f2e825c8 movkx8, #16686, lsl #48 805c78: 9e670101 fmovd1, x8 805c7c: d2c80008 mov x8, #70368744177664 805c80: f2e811e8 movkx8, #16527, lsl #48 805c84: 1e611800 fdivd0, d0, d1 805c88: 9e620121 scvtf d1, x9 805c8c: 9e670102 fmovd2, x8 805c90: 1f420020 fmadd d0, d1, d2, d0 805c94: 9e780008 fcvtzs x8, d0 805c98: cb080277 sub x23, x19, x8 805c9c: f10006ff cmp x23, #1 805ca0: 540012ab b.lt0x805ef4 805ca4: 35001478 cbnzw24, 0x805f30 805ca8: f9400a88 ldr x8, [x20, #16] 805cac: b468 cbz x8, 0x805cb8 805cb0: f9400108 ldr x8, [x8] 805cb4: b5001248 cbnzx8, 0x805efc 805cb8: f9401280 ldr x0, [x20, #32] 805cbc: 2a1703e2 mov w2, w23 805cc0: b9400281 ldr w1, [x20] 805cc4: 940626eb bl 0x98f870 805cc8: 37f80cc0 tbnzw0, #31, 0x805e60 805ccc: 34001140 cbz w0,
Re: Move bki file pre-processing from initdb to bootstrap
On 01.09.23 14:37, Tom Lane wrote: Krishnakumar R writes: This patch moves the pre-processing for tokens in the bki file from initdb to bootstrap. With these changes the bki file will only be opened once in bootstrap and parsing will be done by the bootstrap parser. You haven't provided any iota of evidence why this would be an improvement. I had played with similar ideas in the past, because it would shave some time of initdb, which would accumulate noticeably over a full test run. But now with the initdb caching mechanism, I wonder whether this is still needed.
Re: Commitfest 2023-09 starts soon
Okay, here we go, starting with: Status summary: Needs review: 227. Waiting on Author: 37. Ready for Committer: 30. Committed: 40. Rejected: 1. Returned with Feedback: 1. Withdrawn: 1. Total: 337. (which is less than CF 2023-07) I have also already applied one round of the waiting-on-author-pruning described below (not included in the above figures). On 31.08.23 10:36, Peter Eisentraut wrote: Commitfest 2023-09 (https://commitfest.postgresql.org/44/) starts in less than 28 hours. If you have any patches you would like considered, be sure to add them in good time. All patch authors, and especially experienced hackers, are requested to make sure the patch status is up to date. If the patch is still being worked on, it might not need to be in "Needs review". Conversely, if you are hoping for a review but the status is "Waiting on author", then it will likely be ignored by reviewers. There are a number of patches carried over from the PG16 development cycle that have been in "Waiting on author" for several months. I will aggressively prune those after the start of this commitfest if there hasn't been any author activity by then.
Re: Unlogged relations and WAL-logging
Is the patch 0003-Remove-unnecessary-smgrimmedsync-when-creating-unlog.patch still relevant, or can this commitfest entry be closed? On 23.08.23 16:40, Heikki Linnakangas wrote: 5. In heapam_relation_set_new_filenode(), we do this: /* * If required, set up an init fork for an unlogged table so that it can * be correctly reinitialized on restart. An immediate sync is required * even if the page has been logged, because the write did not go through * shared_buffers and therefore a concurrent checkpoint may have moved the * redo pointer past our xlog record. Recovery may as well remove it * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE * record. Therefore, logging is necessary even if wal_level=minimal. */ if (persistence == RELPERSISTENCE_UNLOGGED) { Assert(rel->rd_rel->relkind == RELKIND_RELATION || rel->rd_rel->relkind == RELKIND_MATVIEW || rel->rd_rel->relkind == RELKIND_TOASTVALUE); smgrcreate(srel, INIT_FORKNUM, false); log_smgrcreate(newrnode, INIT_FORKNUM); smgrimmedsync(srel, INIT_FORKNUM); } The comment doesn't make much sense, we haven't written nor WAL-logged any page here, with nor without the buffer cache. It made more sense before commit fa0f466d53. Well, it seems to me (and perhaps I am just confused) that complaining that there's no page written here might be a technicality. The point is that there's no synchronization between the work we're doing here -- which is creating a fork, not writing a page -- and any concurrent checkpoint. So we both need to log it, and also sync it immediately. I see. I pushed the fix from the other thread that makes smgrcreate() call register_dirty_segment (commit 4b4798e13). I believe that makes this smgrimmedsync() unnecessary. If a concurrent checkpoint happens with a redo pointer greater than this WAL record, it must've received the fsync request created by smgrcreate(). That depends on the fact that we write the WAL record *after* smgrcreate(). Subtle.. Hmm, we have a similar smgrimmedsync() call after index build, because we have written pages directly with smgrextend(skipFsync=true). If no checkpoints have occurred during the index build, we could call register_dirty_segment() instead of smgrimmedsync(). That would avoid the fsync() latency when creating an index on an empty or small index. This is all very subtle to get right though. That's why I'd like to invent a new bulk-creation facility that would handle this stuff, and make the callers less error-prone. Having a more generic and less error-prone bulk-creation mechanism is still on my long TODO list..
Re: Should we use MemSet or {0} for struct initialization?
On Thu, 31 Aug 2023 at 13:35, Junwang Zhao wrote: > > > If the struct has padding or aligned, {0} only guarantee the struct > > > members initialized to 0, while memset sets the alignment/padding > > > to 0 as well, but since we will not access the alignment/padding, so > > > they give the same effect. > > > > See above -- if it's used as a hash key, for example, you must clear > > everything. > > Yeah, if memcmp was used as the key comparison function, there is a problem. The C standard says: > When a value is stored in an object of structure or union type, including in > a member object, the bytes of the object representation that correspond to > any padding bytes take unspecified values. So if you set any of the fields after a MemSet, the values of the padding bytes that were set to 0 are now unspecified. It seems much safer to actually spell out the padding fields of a hash key.
Re: Move bki file pre-processing from initdb to bootstrap
Krishnakumar R writes: > This patch moves the pre-processing for tokens in the bki file from > initdb to bootstrap. With these changes the bki file will only be > opened once in bootstrap and parsing will be done by the bootstrap > parser. You haven't provided any iota of evidence why this would be an improvement. regards, tom lane
Re: Assert failure in ATPrepAddPrimaryKey
On 2023-Sep-01, Richard Guo wrote: > I ran into an Assert failure in ATPrepAddPrimaryKey() with the query > below: > > CREATE TABLE t0(c0 boolean); > CREATE TABLE t1() INHERITS(t0); > > # ALTER TABLE t0 ADD CONSTRAINT m EXCLUDE ((1) WITH =); > server closed the connection unexpectedly Ugh, right, I failed to make the new function do nothing for this case; this had no coverage. Fix attached, with some additional test cases based on yours. Thanks for reporting. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "XML!" Exclaimed C++. "What are you doing here? You're not a programming language." "Tell that to the people who use me," said XML. https://burningbird.net/the-parable-of-the-languages/ >From 6df9aa36f250cd3b131efc91e0991fd231597523 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 1 Sep 2023 13:41:09 +0200 Subject: [PATCH] Only process inheritance for primary keys, not other constraint types --- src/backend/commands/tablecmds.c | 12 +--- src/test/regress/expected/inherit.out | 25 - src/test/regress/sql/inherit.sql | 15 ++- 3 files changed, 47 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index d097da3c78..0339774672 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8906,19 +8906,25 @@ ATPrepAddPrimaryKey(List **wqueue, Relation rel, AlterTableCmd *cmd, { List *children; List *newconstrs = NIL; ListCell *lc; - IndexStmt *stmt; + IndexStmt *indexstmt; + + /* No work if not creating a primary key */ + if (!IsA(cmd->def, IndexStmt)) + return; + indexstmt = castNode(IndexStmt, cmd->def); + if (!indexstmt->primary) + return; /* No work if no legacy inheritance children are present */ if (rel->rd_rel->relkind != RELKIND_RELATION || !rel->rd_rel->relhassubclass) return; children = find_inheritance_children(RelationGetRelid(rel), lockmode); - stmt = castNode(IndexStmt, cmd->def); - foreach(lc, stmt->indexParams) + foreach(lc, indexstmt->indexParams) { IndexElem *elem = lfirst_node(IndexElem, lc); Constraint *nnconstr; diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out index dae61b9a0b..59583e1e41 100644 --- a/src/test/regress/expected/inherit.out +++ b/src/test/regress/expected/inherit.out @@ -2308,9 +2308,32 @@ create table inh_child (a int primary key); alter table inh_child inherit inh_parent; -- nope ERROR: column "a" in child table must be marked NOT NULL alter table inh_child alter a set not null; alter table inh_child inherit inh_parent; -- now it works -drop table inh_parent, inh_child; +-- don't interfere with other types of constraints +alter table inh_parent add constraint inh_parent_excl exclude ((1) with =); +alter table inh_parent add constraint inh_parent_uq unique (a); +alter table inh_parent add constraint inh_parent_fk foreign key (a) references inh_parent (a); +create table inh_child2 () inherits (inh_parent); +create table inh_child3 (like inh_parent); +alter table inh_child3 inherit inh_parent; +select conrelid::regclass, conname, contype, coninhcount, conislocal + from pg_constraint + where conrelid::regclass::text in ('inh_parent', 'inh_child', 'inh_child2', 'inh_child3') + order by 2, 1; + conrelid |conname| contype | coninhcount | conislocal ++---+-+-+ + inh_child2 | inh_child2_a_not_null | n | 1 | f + inh_child3 | inh_child3_a_not_null | n | 1 | t + inh_child | inh_child_a_not_null | n | 1 | t + inh_child | inh_child_pkey| p | 0 | t + inh_parent | inh_parent_excl | x | 0 | t + inh_parent | inh_parent_fk | f | 0 | t + inh_parent | inh_parent_pkey | p | 0 | t + inh_parent | inh_parent_uq | u | 0 | t +(8 rows) + +drop table inh_parent, inh_child, inh_child2, inh_child3; -- -- test multi inheritance tree -- create table inh_parent(f1 int not null); diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql index 9ceaec1d78..abe8602682 100644 --- a/src/test/regress/sql/inherit.sql +++ b/src/test/regress/sql/inherit.sql @@ -845,9 +845,22 @@ create table inh_parent (a int primary key); create table inh_child (a int primary key); alter table inh_child inherit inh_parent; -- nope alter table inh_child alter a set not null; alter table inh_child inherit inh_parent; -- now it works -drop table inh_parent, inh_child; + +-- don't interfere with other types of constraints +alter table inh_parent add constraint inh_parent_excl exclude ((1) with =); +alter table inh_parent add constraint inh_parent_uq unique (a); +alter table inh_parent add constraint inh_parent_fk foreign key (a) references inh_parent (a); +create table
Re: generate syscache info automatically
On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut wrote: > Attached is an updated patch set. > > One win here is that the ObjectProperty lookup now uses a hash function > instead of a linear search. So hopefully the performance is better (to > be confirmed) and it could now be used for things like [0]. + # XXX This one neither, but if I add it to @skip, PerfectHash will fail. (???) + #FIXME: AttributeRelationId I took a quick look at this, and attached is the least invasive way to get it working for now, which is to bump the table size slightly. The comment says doing this isn't reliable, but it happens to work in this case. Playing around with the functions is hit-or-miss, and when that fails, somehow the larger table saves the day. -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 199091305c..3640ee154b 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -223,6 +223,7 @@ foreach my $header (@ARGV) # XXX These catalogs were not covered by the previous hand-maintained table. my @skip = qw( + AttributeRelationId AttrDefaultRelationId EnumRelationId IndexRelationId LargeObjectRelationId ParameterAclRelationId PublicationNamespaceRelationId PublicationRelRelationId @@ -235,8 +236,6 @@ foreach my $header (@ARGV) SecLabelRelationId SharedSecLabelRelationId PartitionedRelationId RangeRelationId SequenceRelationId SubscriptionRelRelationId); - # XXX This one neither, but if I add it to @skip, PerfectHash will fail. (???) - #FIXME: AttributeRelationId # XXX hardcoded ObjectType mapping -- where to put this? my %objtypes = ( diff --git a/src/tools/PerfectHash.pm b/src/tools/PerfectHash.pm index e54905a3ef..f343661859 100644 --- a/src/tools/PerfectHash.pm +++ b/src/tools/PerfectHash.pm @@ -200,7 +200,8 @@ sub _construct_hash_table # can be rejected due to touching unused hashtable entries. In practice, # neither effect seems strong enough to justify using a larger table.) my $nedges = scalar @keys; # number of edges - my $nverts = 2 * $nedges + 1;# number of vertices + # WIP for object properties + my $nverts = 2 * $nedges + 3;# number of vertices # However, it would be very bad if $nverts were exactly equal to either # $hash_mult1 or $hash_mult2: effectively, that hash function would be
Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view
On 01.09.23 12:44, Michael Paquier wrote: I am not sure what you have in mind, but IMO any solution would live better as long as a solution is: - not linked to hba.c, handled in a separate code path. - linked to all configuration files where comments are supported, if need be. If I understood you correctly: You mean an independent feature that i.e. gets raw lines and parses the inline #comments. Doing so we could indeed avoid the trouble of messing around with the hba.c logic, and it would be accessible to other config files. Very interesting thought! It sounds like a much more elegant solution. Perhaps others have more opinions. -- Michael If I hear no objections, I'll try to sketch it as you suggested. Thanks again for the feedback Jim
Re: SQL:2011 application time
On 9/1/23 11:30, Peter Eisentraut wrote: 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT OVERLAPS clause attach to the last column, or to the whole column list? In the SQL standard, you can only have one period and it has to be listed last, so this question does not arise. But here we are building a more general facility to then build the SQL facility on top of. So I think it doesn't make sense that the range column must be last or that there can only be one. Also, your implementation requires at least one non-overlaps column, which also seems like a confusing restriction. I think the WITHOUT OVERLAPS clause should be per-column, so that something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would be possible. Then the WITHOUT OVERLAPS clause would directly correspond to the choice between equality or overlaps operator per column. An alternative interpretation would be that WITHOUT OVERLAPS applies to the whole column list, and we would take it to mean, for any range column, use the overlaps operator, for any non-range column, use the equals operator. But I think this would be confusing and would prevent the case of using the equality operator for some ranges and the overlaps operator for some other ranges in the same key. I prefer the first option. That is: WITHOUT OVERLAPS applies only to the column or expression it is attached to, and need not be last in line. -- Vik Fearing
Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view
On Fri, Sep 01, 2023 at 11:32:35AM +0200, Jim Jones wrote: > Would you be in favor of parsing #comments instead? Given that # is > currently already being parsed (ignored), it shouldn't add too much > complexity to the code. I am not sure what you have in mind, but IMO any solution would live better as long as a solution is: - not linked to hba.c, handled in a separate code path. - linked to all configuration files where comments are supported, if need be. Perhaps others have more opinions. -- Michael signature.asc Description: PGP signature
Re: PATCH: Add REINDEX tag to event triggers
On 01.09.23 11:23, jian he wrote: because the change made in here: https://git.postgresql.org/cgit/postgresql.git/diff/src/backend/commands/indexcmds.c?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7 I manually slight edited the patch content: from static List *ChooseIndexColumnNames(List *indexElems); static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params, bool isTopLevel); to static List *ChooseIndexColumnNames(const List *indexElems); static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool isTopLevel); can you try the attached one. The patch applies cleanly. However, now the compiler complains about a qualifier mismatch indexcmds.c:2707:33: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] 2707 | currentReindexStatement = stmt; Perhaps 'const ReindexStmt *currentReindexStatement'? Tests also work just fine. I also tested it against unique and partial indexes, and it worked as expected. Jim
Fix a typo in decode.c
Hi, When reading the code, I noticed a typo in the description of WAL record. /* - * Decode XLOG_HEAP2_MULTI_INSERT_insert record into multiple tuplebufs. + * Decode XLOG_HEAP2_MULTI_INSERT record into multiple tuplebufs. * And attach a small patch to fix it. Best Regards, Hou Zhijie 0001-fix-typo-in-decode.c.patch Description: 0001-fix-typo-in-decode.c.patch
Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view
Hi Michael On 01.09.23 03:18, Michael Paquier wrote: hba.c is complex enough these days (inclusion logic, tokenization of the items) that I am not in favor of touching its code paths for anything like that. This is not something that can apply only to pg_hba.conf, but to all configuration files. It is indeed possible to extrapolate it to any configuration file, but my point was rather to visualize comments purposefully left by the DBA regarding user access (pg_hba and pg_ident). And this touches in adding support for a second type of comment format. This is one of these areas where we may want a smarter version of pg_read_file that returns a SRF for (line_number, line_contents) of a file read? Note that it is possible to add comments at the end of a HBA entry already, like: local all all trust # My comment, and this is a correct HBA entry. I also considered parsing the inline #comments - actually it was my first idea - but I thought it would leave no option to make an inline comment without populating pg_hba_file_rules. But I guess in this case one could always write the comment in the line above :) Would you be in favor of parsing #comments instead? Given that # is currently already being parsed (ignored), it shouldn't add too much complexity to the code. Thanks for the feedback. Jim
Re: SQL:2011 application time
On 31.08.23 23:26, Paul Jungwirth wrote: I've tried to clean up the first four patches to get them ready for committing, since they could get committed before the PERIOD patch. I think there is a little more cleanup needed but they should be ready for a review. Looking at the patch 0001 "Add temporal PRIMARY KEY and UNIQUE constraints": Generally, this looks like a good direction. The patch looks comprehensive, with documentation and tests, and appears to cover all the required pieces (client programs, ruleutils, etc.). I have two conceptual questions that should be clarified before we go much further: 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT OVERLAPS clause attach to the last column, or to the whole column list? In the SQL standard, you can only have one period and it has to be listed last, so this question does not arise. But here we are building a more general facility to then build the SQL facility on top of. So I think it doesn't make sense that the range column must be last or that there can only be one. Also, your implementation requires at least one non-overlaps column, which also seems like a confusing restriction. I think the WITHOUT OVERLAPS clause should be per-column, so that something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would be possible. Then the WITHOUT OVERLAPS clause would directly correspond to the choice between equality or overlaps operator per column. An alternative interpretation would be that WITHOUT OVERLAPS applies to the whole column list, and we would take it to mean, for any range column, use the overlaps operator, for any non-range column, use the equals operator. But I think this would be confusing and would prevent the case of using the equality operator for some ranges and the overlaps operator for some other ranges in the same key. 2) The logic hinges on get_index_attr_temporal_operator(), to pick the equality and overlaps operator for each column. For btree indexes, the strategy numbers are fixed, so this is straightforward. But for gist indexes, the strategy numbers are more like recommendations. Are we comfortable with how this works? I mean, we could say, if you want to be able to take advantage of the WITHOUT OVERLAPS syntax, you have to use these numbers, otherwise you're on your own. It looks like the gist strategy numbers are already hardcoded in a number of places, so maybe that's all okay, but I feel we should be more explicit about this somewhere, maybe in the documentation, or at least in code comments. Besides that, some stylistic comments: * There is a lot of talk about "temporal" in this patch, but this functionality is more general than temporal. I would prefer to change this to more neutral terms like "overlaps". * The field ii_Temporal in IndexInfo doesn't seem necessary and could be handled via local variables. See [0] for a similar discussion: [0]: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c...@eisentraut.org * In gram.y, change withoutOverlapsClause -> without_overlaps_clause for consistency with the surrounding code. * No-op assignments like n->without_overlaps = NULL; can be omitted. (Or you should put them everywhere. But only in some places seems inconsistent and confusing.)
Re: PATCH: Add REINDEX tag to event triggers
On Wed, Aug 30, 2023 at 8:38 PM Jim Jones wrote: > > Greetings > > > > I was unable to apply it in 7ef5f5f > > $ git apply -v v2-0001-Add-REINDEX-tag-to-event-triggers.patch > Checking patch doc/src/sgml/event-trigger.sgml... > Checking patch src/backend/commands/indexcmds.c... > error: while searching for: > static List *ChooseIndexColumnNames(List *indexElems); > static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params, > bool isTopLevel); > static void RangeVarCallbackForReindexIndex(const RangeVar *relation, > > Oid relId, Oid oldRelId, void *arg); > static Oid ReindexTable(RangeVar *relation, ReindexParams *params, > > error: patch failed: src/backend/commands/indexcmds.c:94 > error: src/backend/commands/indexcmds.c: patch does not apply > Checking patch src/backend/tcop/utility.c... > Checking patch src/include/tcop/cmdtaglist.h... > Checking patch src/test/regress/expected/event_trigger.out... > Hunk #1 succeeded at 556 (offset 2 lines). > Checking patch src/test/regress/sql/event_trigger.sql... > > Am I missing something? > > Jim because the change made in here: https://git.postgresql.org/cgit/postgresql.git/diff/src/backend/commands/indexcmds.c?id=11af63fb48d278b86aa948a5b57f136ef03c2bb7 I manually slight edited the patch content: from static List *ChooseIndexColumnNames(List *indexElems); static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params, bool isTopLevel); to static List *ChooseIndexColumnNames(const List *indexElems); static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool isTopLevel); can you try the attached one. diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml index 3b6a5361..aa1b359f 100644 --- a/doc/src/sgml/event-trigger.sgml +++ b/doc/src/sgml/event-trigger.sgml @@ -1013,6 +1013,14 @@ - + +REINDEX +X +X +- +- + + REVOKE X diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index ab8b81b3..b78f1d87 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -95,6 +95,8 @@ static char *ChooseIndexNameAddition(const List *colnames); static List *ChooseIndexColumnNames(const List *indexElems); static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool isTopLevel); +static void reindex_event_trigger_collect(Oid oid); +static void reindex_event_trigger_collect_relation(Oid oid); static void RangeVarCallbackForReindexIndex(const RangeVar *relation, Oid relId, Oid oldRelId, void *arg); static Oid ReindexTable(const RangeVar *relation, const ReindexParams *params, @@ -111,6 +113,8 @@ static bool ReindexRelationConcurrently(Oid relationOid, static void update_relispartition(Oid relationId, bool newval); static inline void set_indexsafe_procflags(void); +static ReindexStmt *currentReindexStatement = NULL; + /* * callback argument type for RangeVarCallbackForReindexIndex() */ @@ -2696,6 +2700,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel) bool verbose = false; char *tablespacename = NULL; + /* + * Make current stmt available for event triggers without directly passing + * the context to every subsequent call. + */ + currentReindexStatement = stmt; + /* Parse option list */ foreach(lc, stmt->params) { @@ -2776,6 +2786,12 @@ ExecReindex(ParseState *pstate, const ReindexStmt *stmt, bool isTopLevel) (int) stmt->kind); break; } + + /* + * Clear the reindex stmt global reference now that triggers should have + * completed + */ + currentReindexStatement = NULL; } /* @@ -2827,6 +2843,8 @@ ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool is newparams.options |= REINDEXOPT_REPORT_PROGRESS; reindex_index(indOid, false, persistence, ); + /* Add the index to event trigger */ + reindex_event_trigger_collect(indOid); } } @@ -2950,6 +2968,9 @@ ReindexTable(const RangeVar *relation, const ReindexParams *params, bool isTopLe ereport(NOTICE, (errmsg("table \"%s\" has no indexes to reindex", relation->relname))); + + /* Create even for the indexes being modified */ + reindex_event_trigger_collect_relation(heapOid); } return heapOid; @@ -3366,6 +3387,8 @@ ReindexMultipleInternal(const List *relids, const ReindexParams *params) newparams.options |= REINDEXOPT_REPORT_PROGRESS | REINDEXOPT_MISSING_OK; reindex_index(relid, false, relpersistence, ); + /* Add the index to event trigger */ + reindex_event_trigger_collect(relid); PopActiveSnapshot(); /* reindex_index() does the verbose output */ } @@ -3387,6 +3410,9 @@ ReindexMultipleInternal(const List *relids, const ReindexParams
Re: persist logical slots to disk during shutdown checkpoint
On Fri, Sep 1, 2023 at 1:11 PM Ashutosh Bapat wrote: > > On Thu, Aug 31, 2023 at 7:28 PM Amit Kapila wrote: > > > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > > wrote: > > > > > > > + qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), > > > > reading WAL from ([A-F0-9]+\/[A-F0-9]+)./ > > > > > > I don't think the LSN reported in this message is guaranteed to be the > > > confirmed_flush LSN of the slot. It's usually confirmed_flush but not > > > always. It's the LSN that snapshot builder computes based on factors > > > including confirmed_flush. There's a chance that this test will fail > > > sometimes because of this behaviour. > > > > > > > I think I am missing something here because as per my understanding, > > the LOG referred by the test is generated in CreateDecodingContext() > > before which we shouldn't be changing the slot's confirmed_flush LSN. > > The LOG [1] refers to the slot's persistent value for confirmed_flush, > > so how it could be different from what the test is expecting. > > > > [1] > > errdetail("Streaming transactions committing after %X/%X, reading WAL > > from %X/%X.", > >LSN_FORMAT_ARGS(slot->data.confirmed_flush), > > I was afraid that we may move confirmed_flush while creating the > snapshot builder when creating the decoding context. But I don't see > any code doing that. So may be we are safe. > We are safe in that respect. As far as I understand there is no reason to be worried. > But if the log message > changes, this test would fail - depending upon the log message looks a > bit fragile, esp. when we have a way to access the data directly > reliably. > This message is there from the very begining (b89e1510) and I can't forsee a reason to change such a message. But even if we change, we can always change the test output or test accordingly, if required. I think it is a matter of preference to which way we can write the test, so let's not argue too much on this. I find current way slightly more reliable but we can change it if we see any problem. -- With Regards, Amit Kapila.
Re: Eliminate redundant tuple visibility check in vacuum
Hi, On 9/1/23 03:25, Peter Geoghegan wrote: On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman wrote: Any inserting transaction which aborts after heap_page_prune()'s visibility check will now be of no concern to lazy_scan_prune(). Since we don't do the visibility check again, we won't find the tuple HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to the array of tuples for vacuum to reap. This does mean that we won't reap and remove tuples whose inserting transactions abort right after heap_page_prune()'s visibility check. But, this doesn't seem like an issue. It's definitely not an issue. The loop is essentially a hacky way of getting a consistent picture of which tuples should be treated as HEAPTUPLE_DEAD, and which tuples need to be left behind (consistent at the level of each page and each HOT chain, at least). Making that explicit seems strictly better. They may not be removed until the next vacuum, but ISTM it is actually worse to pay the cost of re-pruning the whole page just to clean up that one tuple. Maybe if that renders the page all visible and we can mark it as such in the visibility map -- but that seems like a relatively narrow use case. The chances of actually hitting the retry are microscopic anyway. It has nothing to do with making sure that dead tuples from aborted tuples get removed for its own sake, or anything. Rather, the retry is all about making sure that all TIDs that get removed from indexes can only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD tuples with storage would very occasionally be left behind, which made life difficult in a bunch of other places -- for no good reason. That makes sense and seems like a much better compromise. Thanks for the explanations. Please update the comment to document the corner case and how we handle it. -- David Geier (ServiceNow)
Re: Synchronizing slots from primary to standby
Hi Shveta. Here are some comments for patch v14-0002 The patch is large, so my code review is a WIP... more later next week... == GENERAL 1. Patch size The patch is 2700 lines. Is it possible to break this up into smaller self-contained parts to make the reviews more manageable? ~~~ 2. PG Docs I guess there are missing PG Docs for this patch. E.g there are new GUCs added but I see no documentation yet for them. ~ 3. Terms There are variations of what to call the sync worker - "Slot sync worker" or - "slot-sync worker" or - "slot synchronization worker" or - "slot-synchronization worker" - and others These are all in the comments and messages etc. Better to search/replace to make a consistent term everywhere. FWIW, I preferred just to call it "slot-sync worker". ~ 4. typedefs I think multiple new typedefs are added by this patch. IIUC, those should be included in the file typedef.list so the pg_indent will work properly. 5. max_slot_sync_workers GUC There is already a 'max_sync_workers_per_subscription', but that one is for "tablesync" workers. IMO it is potentially confusing now that both these GUCs have 'sync_workers' in the name. I think it would be less ambiguous to change your new GUC to 'max_slotsync_workers'. == Commit Message 6. Overview? I felt the information in this commit message is describing details of what changes are in this patch but there is no synopsis about the *purpose* of this patch as a whole. Eg. What is it for? It seemed like there should be some introductory paragraph up-front before describing all the specifics. ~~~ 7. For slots to be synchronised, another GUC is added: synchronize_slot_names: This is a runtime modifiable GUC. ~ If this is added by this patch then how come there is some SGML describing the same GUC in patch 14-0001? What is the relationship? ~~~ 8. Let us say slots mentioned in 'synchronize_slot_names' on primary belongs to 10 DBs and say the new GUC is set at default value of 2, then each worker will manage 5 dbs and will keep on synching the slots for them. ~ /the new GUC is set at default value of 2/'max_slot_sync_workers' is 2/ ~~~ 9. If a new DB is found by replication launcher, it will assign this new db to the worker handling the minimum number of dbs currently (or first worker in case of equal count) ~ Hmm. Isn't this only describing cases where max_slot_workers was exceeded? Otherwise, you should just launch a brand new sync-worker, right? ~~~ 10. Each worker slot will have its own dbids list. ~ It seems confusing to say "worker slot" when already talking about workers and slots. Can you reword that more like "Each slot-sync worker will have its own dbids list"? == src/backend/postmaster/bgworker.c == .../libpqwalreceiver/libpqwalreceiver.c 11. libpqrcv_list_db_for_logical_slots +/* + * List DB for logical slots + * + * It gets the list of unique DBIDs for logical slots mentioned in slot_names + * from primary. + */ +static List * +libpqrcv_list_db_for_logical_slots(WalReceiverConn *conn, Comment needs some minor tweaking. ~~~ 12. + if (strcmp(slot_names, "") != 0 && strcmp(slot_names, "*") != 0) + { + char*rawname; + List*namelist; + ListCell *lc; + + appendStringInfoChar(, ' '); + rawname = pstrdup(slot_names); + SplitIdentifierString(rawname, ',', ); + foreach (lc, namelist) + { + if (lc != list_head(namelist)) + appendStringInfoChar(, ','); + appendStringInfo(, "%s", + quote_identifier(lfirst(lc))); + } + } /rawname/rawnames/ ~~~ 13. + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + PQclear(res); + ereport(ERROR, + (errmsg("could not receive list of slots the primary server: %s", + pchomp(PQerrorMessage(conn->streamConn); + } /the primary server/from the primary server/ ~~~ 14. + if (PQnfields(res) < 1) + { + int nfields = PQnfields(res); + + PQclear(res); + ereport(ERROR, + (errmsg("invalid response from primary server"), + errdetail("Could not get list of slots: got %d fields, " + "expected %d or more fields.", +nfields, 1))); + } This code seems over-complicated. If it is < 1 then it can only be zero, right? So then what is the point of calculating and displaying the 'nfields' which can only be 0? ~~~ 15. + ntuples = PQntuples(res); + for (int i = 0; i < ntuples; i++) + { + + slot_data = palloc0(sizeof(WalRecvReplicationSlotDbData)); + if (!PQgetisnull(res, i, 0)) + slot_data->database = atooid(PQgetvalue(res, i, 0)); + + slot_data->last_sync_time = 0; + slotlist = lappend(slotlist, slot_data); + } 15a. Unnecessary blank line in for-block. ~~~ 15b. Unnecessary assignment to 'last_sync_time' because the whole structure was palloc0 just 2 lines above. == src/backend/replication/logical/Makefile == src/backend/replication/logical/launcher.c 16. +/* + * Initial and incremental allocation size for dbids array for each + * SlotSyncWorker in dynamic shared memory i.e. we start with this size + * and once it is exhausted, dbids is rellocated with size
PATCH: document for regression test forgets libpq test
Hi, I found a small mistake in document in 33.1.3. Additional Test Suites. > The additional tests that can be invoked this way include: The list doesn't include interface/libpq/test. I attached patch. Thank you. Best Regards Ryo Matsumura Fujitsu Limited regress_doc_fix.diff Description: regress_doc_fix.diff
Move bki file pre-processing from initdb to bootstrap
Hi All, This patch moves the pre-processing for tokens in the bki file from initdb to bootstrap. With these changes the bki file will only be opened once in bootstrap and parsing will be done by the bootstrap parser. The flow of bki file processing will be as follows: - In initdb gather the values used to replace the tokens in the bki file. - Pass these values into postgres bootstrap startup using '-i' option as key-value pairs. - In bootstrap open the bki file (the bki file name was received as a parameter). - During the parsing of the bki file, replace the tokens received as parameters with their values. Related discussion can be found here: https://www.postgresql.org/message-id/20220216021219.ygzrtb3hd5bn7olz%40alap3.anarazel.de Note: Currently the patch breaks on windows due to placement of extra quotes when passing parameters (Thanks to Thomas Munro for helping me find that). Will follow up with v2 fixing the windows issues on passing the parameters and format fixes. Please review and provide feedback. -- Thanks and Regards, Krishnakumar (KK). [Microsoft] v1-0001-Move-the-pre-processing-for-tokens-in-bki-file-fr.patch Description: Binary data
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Hello Thomas, 31.08.2023 14:15, Thomas Munro wrote: We have a signal that is pending and not blocked, so I don't immediately know why poll() hasn't returned control. When I worked at the Postgres Pro company, we observed a similar lockup under rather specific conditions (we used Elbrus CPU and the specific Elbrus compiler (lcc) based on edg). I managed to reproduce that lockup and Anton Voloshin investigated it. The issue was caused by the compiler optimization in WaitEventSetWait(): waiting = true; ... while (returned_events == 0) { ... if (set->latch && set->latch->is_set) { ... break; } In that case, compiler decided that it may place the read "set->latch->is_set" before the write "waiting = true". (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the issue for us.) I can't provide more details for now, but maybe you could look at the binary code generated on the target platform to confirm or reject my guess. Best regards, Alexander
Re: [PoC] pg_upgrade: allow to upgrade publisher node
Here are some review comments for v29-0002 == src/bin/pg_upgrade/check.c 1. check_old_cluster_for_valid_slots + /* Quick exit if the cluster does not have logical slots. */ + if (count_old_cluster_logical_slots() == 0) + return; /Quick exit/Quick return/ I know they are kind of the same, but the reason I previously suggested this change was to keep it consistent with the similar comment that is already in check_new_cluster_logical_replication_slots(). ~~~ 2. check_old_cluster_for_valid_slots + /* + * Do additional checks to ensure that confirmed_flush LSN of all the slots + * is the same as the latest checkpoint location. + * + * Note: This can be satisfied only when the old cluster has been shut + * down, so we skip this live checks. + */ + if (!live_check) missing word /skip this live checks./skip this for live checks./ == src/bin/pg_upgrade/controldata.c 3. + /* + * Read the latest checkpoint location if the cluster is PG17 + * or later. This is used for upgrading logical replication + * slots. Currently, we need it only for the old cluster but + * didn't add additional check for the similicity. + */ + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) /similicity/simplicity/ SUGGESTION Currently, we need it only for the old cluster but for simplicity chose not to have additional checks. == src/bin/pg_upgrade/info.c 4. get_old_cluster_logical_slot_infos_per_db + /* + * The temporary slots are expressly ignored while checking because such + * slots cannot exist after the upgrade. During the upgrade, clusters are + * started and stopped several times so that temporary slots will be + * removed. + */ + res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase " + "FROM pg_catalog.pg_replication_slots " + "WHERE slot_type = 'logical' AND " + "wal_status <> 'lost' AND " + "database = current_database() AND " + "temporary IS FALSE;"); IIUC, the removal of temp slots is just a side-effect of the start/stop; not the *reason* for the start/stop. So, the last sentence needs some modification BEFORE During the upgrade, clusters are started and stopped several times so that temporary slots will be removed. SUGGESTION During the upgrade, clusters are started and stopped several times causing any temporary slots to be removed. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Incremental View Maintenance, take 2
hi based on v29. based on https://stackoverflow.com/a/4014981/1560347: I added a new function append_update_set_caluse, and deleted functions: {append_set_clause_for_count, append_set_clause_for_sum, append_set_clause_for_avg, append_set_clause_for_minmax} I guess this way is more extensible/generic than yours. replaced the following code with the generic function: append_update_set_caluse. + /* For views with aggregates, we need to build SET clause for updating aggregate + * values. */ + if (query->hasAggs && IsA(tle->expr, Aggref)) + { + Aggref *aggref = (Aggref *) tle->expr; + const char *aggname = get_func_name(aggref->aggfnoid); + + /* + * We can use function names here because it is already checked if these + * can be used in IMMV by its OID at the definition time. + */ + + /* count */ + if (!strcmp(aggname, "count")) + append_set_clause_for_count(resname, aggs_set_old, aggs_set_new, aggs_list_buf); + + /* sum */ + else if (!strcmp(aggname, "sum")) + append_set_clause_for_sum(resname, aggs_set_old, aggs_set_new, aggs_list_buf); + + /* avg */ + else if (!strcmp(aggname, "avg")) + append_set_clause_for_avg(resname, aggs_set_old, aggs_set_new, aggs_list_buf, + format_type_be(aggref->aggtype)); + + else + elog(ERROR, "unsupported aggregate function: %s", aggname); + } --<<< attached is my refactor. there is some whitespace errors in the patches, you need use git apply --reject --whitespace=fix basedon_v29_matview_c_refactor_update_set_clause.patch Also you patch cannot use git apply, i finally found out bulk apply using gnu patch from https://serverfault.com/questions/102324/apply-multiple-patch-files. previously I just did it manually one by one. I think if you use { for i in $PATCHES/v29*.patch; do patch -p1 < $i; done } GNU patch, it will generate an .orig file for every modified file? -< src/backend/commands/matview.c 2268: /* For tuple deletion */ maybe "/* For tuple deletion and update*/" is more accurate? -< currently at here: src/test/regress/sql/incremental_matview.sql 98: -- support SUM(), COUNT() and AVG() aggregate functions 99: BEGIN; 100: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j), COUNT(i), AVG(j) FROM mv_base_a GROUP BY i; 101: SELECT * FROM mv_ivm_agg ORDER BY 1,2,3,4; 102: INSERT INTO mv_base_a VALUES(2,100); src/backend/commands/matview.c 2858: if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT) 2859: elog(ERROR, "SPI_exec failed: %s", querybuf.data); then I debug, print out querybuf.data: WITH updt AS (UPDATE public.mv_ivm_agg AS mv SET __ivm_count__ = mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ , sum = (CASE WHEN mv.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.sum IS NULL THEN diff.sum WHEN diff.sum IS NULL THEN mv.sum ELSE (mv.sum OPERATOR(pg_catalog.+) diff.sum) END), __ivm_count_sum__ = (mv.__ivm_count_sum__ OPERATOR(pg_catalog.+) diff.__ivm_count_sum__), count = (mv.count OPERATOR(pg_catalog.+) diff.count), avg = (CASE WHEN mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+) diff.__ivm_sum_avg__)::numeric END) OPERATOR(pg_catalog./) (mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__), __ivm_sum_avg__ = (CASE WHEN mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+) diff.__ivm_sum_avg__) END), __ivm_count_avg__ = (mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__) FROM new_delta AS diff WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL)) RETURNING mv.i) INSERT INTO public.mv_ivm_agg (i, sum, count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__, __ivm_count__) SELECT i, sum, count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__, __ivm_count__ FROM new_delta AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL))); At this final SPI_exec, we have a update statement with related columns { __ivm_count_sum__, sum, __ivm_count__, count, avg, __ivm_sum_avg__, __ivm_count_avg__}. At this time, my mind stops working, querybuf.data is way too big, but I still feel like there is some logic associated with these columns, maybe we can use it as an assertion to prove that this query (querybuf.len = 1834) is indeed correct. Since the apply delta query is quite complex, I feel like adding some "if debug then print out the final querybuf.data end if" would be a good idea. we add hidden columns somewhere, also to avoid corner cases,
Re: persist logical slots to disk during shutdown checkpoint
On Thu, Aug 31, 2023 at 7:28 PM Amit Kapila wrote: > > On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat > wrote: > > > > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila wrote: > > > > > > I > > > > > > think we should shut down subscriber, restart publisher and then > > > > > > make this > > > > > > check based on the contents of the replication slot instead of > > > > > > server log. > > > > > > Shutting down subscriber will ensure that the subscriber won't send > > > > > > any new > > > > > > confirmed flush location to the publisher after restart. > > > > > > > > > > > > > > > > But if we shutdown the subscriber before the publisher there is no > > > > > guarantee that the publisher has sent all outstanding logs up to the > > > > > shutdown checkpoint record (i.e., the latest record). Such a guarantee > > > > > can only be there if we do a clean shutdown of the publisher before > > > > > the subscriber. > > > > > > > > So the sequence is shutdown publisher node, shutdown subscriber node, > > > > start publisher node and carry out the checks. > > > > > > > > > > This can probably work but I still prefer the current approach as that > > > will be closer to the ideal values on the disk instead of comparison > > > with a later in-memory value of confirmed_flush LSN. Ideally, if we > > > would have a tool like pg_replslotdata which can read the on-disk > > > state of slots that would be better but missing that, the current one > > > sounds like the next best possibility. Do you see any problem with the > > > current approach of test? > > > > > + qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+), > > > reading WAL from ([A-F0-9]+\/[A-F0-9]+)./ > > > > I don't think the LSN reported in this message is guaranteed to be the > > confirmed_flush LSN of the slot. It's usually confirmed_flush but not > > always. It's the LSN that snapshot builder computes based on factors > > including confirmed_flush. There's a chance that this test will fail > > sometimes because of this behaviour. > > > > I think I am missing something here because as per my understanding, > the LOG referred by the test is generated in CreateDecodingContext() > before which we shouldn't be changing the slot's confirmed_flush LSN. > The LOG [1] refers to the slot's persistent value for confirmed_flush, > so how it could be different from what the test is expecting. > > [1] > errdetail("Streaming transactions committing after %X/%X, reading WAL > from %X/%X.", >LSN_FORMAT_ARGS(slot->data.confirmed_flush), I was afraid that we may move confirmed_flush while creating the snapshot builder when creating the decoding context. But I don't see any code doing that. So may be we are safe. But if the log message changes, this test would fail - depending upon the log message looks a bit fragile, esp. when we have a way to access the data directly reliably. -- Best Wishes, Ashutosh Bapat
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Fri, Sep 1, 2023 at 10:16 AM Hayato Kuroda (Fujitsu) wrote: > + /* + * Note: This must be done after doing the pg_resetwal command because + * pg_resetwal would remove required WALs. + */ + if (count_old_cluster_logical_slots()) + { + start_postmaster(_cluster, true); + create_logical_replication_slots(); + stop_postmaster(false); + } Can we combine this code with the code in the function issue_warnings_and_set_wal_level()? That will avoid starting/stopping the server for creating slots. -- With Regards, Amit Kapila.
Assert failure in ATPrepAddPrimaryKey
I ran into an Assert failure in ATPrepAddPrimaryKey() with the query below: CREATE TABLE t0(c0 boolean); CREATE TABLE t1() INHERITS(t0); # ALTER TABLE t0 ADD CONSTRAINT m EXCLUDE ((1) WITH =); server closed the connection unexpectedly The related codes are foreach(lc, stmt->indexParams) { IndexElem *elem = lfirst_node(IndexElem, lc); Constraint *nnconstr; Assert(elem->expr == NULL); It seems to be introduced by b0e96f3119. Thanks Richard
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Fri, Sep 1, 2023 at 2:47 AM Joe Conway wrote: > > On 8/31/23 12:52, Jeff Davis wrote: > > On Thu, 2023-08-31 at 10:59 +0530, Ashutosh Bapat wrote: > >> The server's FDW has to be postgres_fdw. So we have to handle the > >> awkward dependency between core and postgres_fdw (an extension). > > > > That sounds more than just "awkward". I can't think of any precedent > > for that and it seems to violate the idea of an "extension" entirely. > > > > Can you explain more concretely how we might resolve that? > > > Maybe move postgres_fdw to be a first class built in feature instead of > an extension? Yes, that's one way. Thinking larger, how about we allow any FDW to be used here. We might as well, allow extensions to start logical receivers which accept changes from non-PostgreSQL databases. So we don't have to make an exception for postgres_fdw. But I think there's some value in bringing together these two subsystems which deal with foreign data logically (as in logical vs physical view of data). -- Best Wishes, Ashutosh Bapat
Re: Improve heapgetpage() performance, overhead from serializable
This thread [1] also Improving the heapgetpage function, and looks like this thread. [1] https://www.postgresql.org/message-id/a9f40066-3d25-a240-4229-ec2fbe94e7a5%40yeah.net Muhammad Malik 于2023年9月1日周五 04:04写道: > Hi, > > Is there a plan to merge this patch in PG16? > > Thanks, > Muhammad > > -- > *From:* Andres Freund > *Sent:* Saturday, July 15, 2023 6:56 PM > *To:* pgsql-hack...@postgresql.org > *Cc:* Thomas Munro > *Subject:* Improve heapgetpage() performance, overhead from serializable > > Hi, > > Several loops which are important for query performance, like > heapgetpage()'s > loop over all tuples, have to call functions like > HeapCheckForSerializableConflictOut() and PredicateLockTID() in every > iteration. > > When serializable is not in use, all those functions do is to to return. > But > being situated in a different translation unit, the compiler can't inline > (without LTO at least) the check whether serializability is needed. It's > not > just the function call overhead that's noticable, it's also that registers > have to be spilled to the stack / reloaded from memory etc. > > On a freshly loaded pgbench scale 100, with turbo mode disabled, postgres > pinned to one core. Parallel workers disabled to reduce noise. All times > are > the average of 15 executions with pgbench, in a newly started, but > prewarmed > postgres. > > SELECT * FROM pgbench_accounts OFFSET 1000; > HEAD: > 397.977 > > removing the HeapCheckForSerializableConflictOut() from heapgetpage() > (incorrect!), to establish the baseline of what serializable costs: > 336.695 > > pulling out CheckForSerializableConflictOutNeeded() from > HeapCheckForSerializableConflictOut() in heapgetpage(), and avoiding > calling > HeapCheckForSerializableConflictOut() in the loop: > 339.742 > > moving the loop into a static inline function, marked as pg_always_inline, > called with static arguments for always_visible, check_serializable: > 326.546 > > marking the always_visible, !check_serializable case likely(): > 322.249 > > removing TestForOldSnapshot() calls, which we pretty much already decided > on: > 312.987 > > > FWIW, there's more we can do, with some hacky changes I got the time down > to > 273.261, but the tradeoffs start to be a bit more complicated. And > 397->320ms > for something as core as this, is imo worth considering on its own. > > > > > Now, this just affects the sequential scan case. heap_hot_search_buffer() > shares many of the same pathologies. I find it a bit harder to improve, > because the compiler's code generation seems to switch between good / bad > with > changes that seems unrelated... > > > I wonder why we haven't used PageIsAllVisible() in > heap_hot_search_buffer() so > far? > > > Greetings, > > Andres Freund >