Re: Eagerly evict bulkwrite strategy ring
Hi, On Fri, 12 Sept 2025 at 02:17, Melanie Plageman wrote: > > While looking at this, I decided it didn't make sense to have sweep > variables in the strategy object, so I've actually changed the way > StrategySweepNextBuffer() works. There was also an issue with the > sweep -- it could run into and past the starting buffer. So, I had to > change it. Take a look at the new method and let me know what you > think. It looks good to me. StrategySweepNextBuffer()'s comment is no longer correct, though. -- Regards, Nazir Bilal Yavuz Microsoft
RE: Conflict detection for update_deleted in logical replication
On Tuesday, September 2, 2025 6:00 PM shveta malik wrote: > > On Mon, Sep 1, 2025 at 5:45 PM shveta malik > wrote: > > > > > > > > Here is V70 patch set. > > > > > > > The patch v70-0001 looks good to me. Verified, all the old issues are > > resolved. > > > > Will resume review of v70-0002 now. > > > > Please find a few comments on v70-0002: > > 1) > - * Note: Retention won't be resumed automatically. The user must manually > - * disable retain_dead_tuples and re-enable it after confirming that the > - * replication slot maintained by the launcher has been dropped. > + * The retention will resume automatically if the worker has confirmed > + that the > + * retention duration is now within the max_retention_duration. > > Do we need this comment atop stop as it does not directly concern stop? Isn't > the details regarding RDT_RESUME_CONFLICT_INFO_RETENTION > in the file-header section sufficient? Agreed. I removed this comment. > > 2) > + /* Stop retention if not yet */ > + if (MySubscription->retentionactive) > + { > + rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION; > > - /* process the next phase */ > - process_rdt_phase_transition(rdt_data, false); > + /* process the next phase */ > + process_rdt_phase_transition(rdt_data, false); } > + > + reset_retention_data_fields(rdt_data); > > should_stop_conflict_info_retention() does reset_retention_data_fields > everytime when wait-time exceeds the limit, and when it actually stops i.e. > calls stop_conflict_info_retention through phase change; the stop function > also does reset_retention_data_fields and calls process_rdt_phase_transition. > Can we optimize this code part by consolidating the > reset_retention_data_fields() and > process_rdt_phase_transition() calls into > should_stop_conflict_info_retention() itself, eliminating redundancy? Agreed. I improved the code here. > > 3) > Shall we update 035_conflicts.pl to have a resume test by setting > max_retention_duration to 0 after stop-retention test? Added. > > 4) > + subscription. The retention will be automatically resumed > once at least > + one apply worker confirms that the retention duration is within the > + specified limit, or a new subscription is created with > + retain_dead_tuples = true, or the user manually > + re-enables retain_dead_tuples. > > Shall we rephrase it slightly to: > > Retention will automatically resume when at least one apply worker confirms > that the retention duration is within the specified limit, or when a new > subscription is created with retain_dead_tuples = true. > Alternatively, retention can be manually resumed by re-enabling > retain_dead_tuples. Changed as suggested. Here is V71 patch set which addressed above comments and [1]. [1] https://www.postgresql.org/message-id/CAJpy0uC8w442wGEJ0gyR23ojAyvd-s_g-m8fUbixy0V9yOmrcg%40mail.gmail.com Best Regards, Hou zj v71-0002-Add-a-dead_tuple_retention_active-column-in-pg_s.patch Description: v71-0002-Add-a-dead_tuple_retention_active-column-in-pg_s.patch v71-0001-Allow-conflict-relevant-data-retention-to-resume.patch Description: v71-0001-Allow-conflict-relevant-data-retention-to-resume.patch
Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring
Hi Andres / Robert, On Mon, Sep 8, 2025 at 5:55 PM Andres Freund wrote: > > Hi, > > On 2025-09-06 09:12:19 -0400, Robert Treat wrote: [..] > > > [..], but 6.1 is the LTS kernel, so plenty of people > > > are going to hit those regressions with io_uring io_method, won't > > > they? > > I doubt it, but who knows. RHEL 8.x won't have it (RH KB [1] says "RHEL 8.x: The addition to RHEL8 was being tracked in private Bug 1881561 - Add io_uring support. Unfortunately, it has been decided that io_uring support will not be enabled in RHEL8." RHEL 9.x seems to be all based on 5.14.x (so much below 6.5.x) and states that uring is in Tech Preview there and is disabled, but it can be enabled via sysctl. Hard to tell what they will backpatch into 5.14.x there. So if anywhere, I would speculate it would be RHEL9 (?), therefore 5.14.x (+their custom back patches). > > > I can try to prepare a patch, please just let me know. > > Yes, please do. Attached. > If they just upgrade in-place, they won't use io_uring. And they won't simply > use io_uring with this large max_connections without also tuning the file > descriptor limits... Business as usual, just another obstacle... -J. v1-0001-aio-warn-user-if-combined-io_uring-memory-mapping.patch Description: Binary data
Add support for entry counting in pgstats
Hi all, One obstacle to the move of pg_stat_statements to the shmem pgstats that we have in core is that there is no easy way to track the total number of entries that are stored in the shared hash table of pgstats for variable-sized stats kinds. PGSS currently hash_get_num_entries() as a cheap way to get the number of entries stored in its hash table, but moving PGSS to the in-core stats facility means that we need a different approach to count these entries. I have looked at what can be done, and finished with a rather simple approach, as we have only one code path when a new entry is inserted, and one code path when an entry is removed when the refcount of an entry reaches 0 (includes resets, drops for kind matches, etc.). The counters are stored in a uint64 atomic, one for each stats kind in PgStat_ShmemControl, set only if the option is enabled. I am wondering how much protection we want for the reads, and chose the lightest "lossy" approach, leaving to stats kind what they want to do when deallocating entries. Hence, a stats kind may want to protect the entry deallocations with an extra lock, but that's not required either depending on how aggressive removals should happen. Please find attached that adds an option for variable-sized stats kinds given these the possibility to track the number of entries in the shared hash table. The option is named track_counts. The patch has some coverage added in the test module injection_points, in its TAP test. Thoughts are welcome. -- Michael From 0c924a72f586c385f6ab1a22174b9c3b1cf2dd08 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 12 Sep 2025 16:09:51 +0900 Subject: [PATCH 1/2] Add support for entry counting in pgstats Stats kinds can set an option call track_counts, that will make pgstats track the number of entries that exist in the shared hashtable. As there is only one code path where a new entry is added, and one code path where entries are removed, the count tracking is rather straight-forward. Reads of these counters are optimistic, and may change across two calls. A use case of this facility is PGSS, where we need to be able to cap the number of entries that would be stored in the shared hashtable, based on its "max" GUC. --- src/include/utils/pgstat_internal.h | 26 + src/backend/utils/activity/pgstat.c | 6 +++ src/backend/utils/activity/pgstat_shmem.c | 47 +-- 3 files changed, 67 insertions(+), 12 deletions(-) diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 6cf8f633..8762a06081f8 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -216,6 +216,12 @@ typedef struct PgStat_KindInfo /* Should stats be written to the on-disk stats file? */ bool write_to_file:1; + /* + * Should the number of entries be tracked? For variable-numbered stats, + * to update its PgStat_ShmemControl.entry_counts. + */ + bool track_counts:1; + /* * The size of an entry in the shared stats hash table (pointed to by * PgStatShared_HashEntry->body). For fixed-numbered statistics, this is @@ -485,6 +491,15 @@ typedef struct PgStat_ShmemControl */ pg_atomic_uint64 gc_request_count; + /* + * Counters for the number of entries associated to a single stats kind + * that uses variable-numbered objects stored in the shared hash table. + * These counters can be enabled on a per-kind basis, when track_counts is + * set. This counter is incremented each time a new entry is created (not + * when reused), and each time an entry is dropped. + */ + pg_atomic_uint64 entry_counts[PGSTAT_KIND_MAX]; + /* * Stats data for fixed-numbered objects. */ @@ -910,6 +925,17 @@ pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common *entry) return ((char *) (entry)) + off; } +/* + * Returns the number of entries counted for a stats kind. + */ +static inline uint64 +pgstat_get_entry_count(PgStat_Kind kind) +{ + Assert(pgstat_get_kind_info(kind)->track_counts); + + return pg_atomic_read_u64(&pgStatLocal.shmem->entry_counts[kind - 1]); +} + /* * Returns a pointer to the shared memory area of custom stats for * fixed-numbered statistics. diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index f8e91484e36b..f6784ad6cd15 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -1490,6 +1490,12 @@ pgstat_register_kind(PgStat_Kind kind, const PgStat_KindInfo *kind_info) ereport(ERROR, (errmsg("custom cumulative statistics property is invalid"), errhint("Custom cumulative statistics require a shared memory size for fixed-numbered objects."))); + if (kind_info->track_counts) + { + ereport(ERROR, + (errmsg("custom cumulative statistics property is invalid"), + errhint("Custom cumulative statistics cannot use counter tracking for fixed-numbered objects."))); + } } /* diff --git a/src/backend/ut
Re: someone else to do the list of acknowledgments
516 contributors this cycle vs 462 last cycle. When name accents/capitalization differed, I went with the string used in the previous year. Some of the names that come from bug reports and doc fixes are just first names, and the discussion threads shed no light on the full name of the person. I have a git log that has been "redacted", which is to say that every name attribution has been screened out, as well as names of known past contributors, and various other name-looking things that aren't people (ex. Microsoft Windows) screened out. Even with all that automation it is still over 50k lines, so if you want to double check that, make sure you have a comfy chair. On Tue, Sep 9, 2025 at 12:07 PM jian he wrote: > On Sun, Aug 17, 2025 at 12:40 PM Corey Huinker > wrote: > > > > On Sat, Aug 16, 2025 at 11:58 PM jian he > wrote: > >> > >> hi. > >> maybe we should start working on this? > >> > >> https://www.postgresql.org/developer/roadmap > >> says 18 will be released in September 2025. > > > > > > I am. > > > > hi. > Maybe you can share the draft patch so others can review it. > > https://www.postgresql.org says: > "The planned date for the general availability of PostgreSQL 18 is > September 25, 2025." > Abhishek Chanda Adam Guo Adam Rauch Aidar Imamov Ajin Cherian Alastair Turner Alec Cozens Aleksander Alekseev Alena Rybakina Alex Friedman Alex Richman Alexander Alehin Alexander Borisov Alexander Korotkov Alexander Kozhemyakin Alexander Kukushkin Alexander Kuzmenkov Alexander Kuznetsov Alexander Lakhin Alexander Pyhalov Alexandra Wang Alexey Dvoichenkov Alexey Makhmutov Alexey Shishkin Ali Akbar Alvaro Herrera Amit Kapila Amit Langote Amul Sul Andreas Karlsson Andreas Scherbaum Andreas Ulbrich Andrei Lepikhov Andres Freund Andrew Andrew Bille Andrew Dunstan Andrew Jackson Andrew Kane Andrew Watkins Andrey Borodin Andrey Chudnovsky Andrey Lepikhov Andrey Rachitskiy Andrey Rudometov Andy Alsup Andy Fan Anthonin Bonnefoy Anthony Hsu Anthony Leung Anton A. Melnikov Anton Melnikov Anton Voloshin Antonin Houska Antti Lampinen Arseniy Mukhin Artur Zakirov Arun Thirupathi Ashutosh Bapat Asphator Atsushi Torikoshi Avi Weinberg Aya Iwata Aysén Region Ayush Tiwari Ayush Vatsa Bastien Roucariès Ben Peachey Higdon Benoit Lobréau Bernd Helmle Bernd Reiß Bernhard Wiedemann Bertrand Drouvot Bertrand Mamasam Bharath Rupireddy Bogdan Grigorenko Boyu Yang Braulio Fdo Gonzalez Bruce Momjian Bykov Ivan Cameron Vogt Cary Huang Cees van Zeeland ChangAo Chen Chao Li Chapman Flack Charles Samborski Chengxi Sun Chiranmoy Bhattacharya Chris Gooch Christoph Berg Christophe Courtois Christopher Inokuchi Clemens Ruck Corey Huinker Craig Milhiser Crisp Lee Cédric Villemain Dagfinn Ilmari Mannsåker Daniel Elishakov Daniel Gustafs Daniel Gustafsson Daniel Vérité Daniel Westermann Daniele Varrazzo Daniil Davydov Daria Shanina Dave Cramer Dave Page David Benjamin David Christensen David E. Wheeler David Fiedler David G. Johnston David Geier David Rowley David Steele David Wheeler David Zhang Davinder Singh Dean Rasheed Devanga Susmitha Devrim Gündüz Dian Fay Dilip Kumar Dimitrios Apostolou Dipesh Dhameliya Dmitrii Bondar Dmitry Dmitry Dolgov Dmitry Koval Dmitry Kovalenko Dominique Devienne Donghang Lin Dorjpalam Batbaatar Drew Callahan Duncan Sands Dzmitry Jachnik Egor Chindyaskin Egor Rogov Emanuele Musella Emre Hasegeli Eric Cyr Erica Zhang Erik Nordström Erik Rijkers Erik Wienhold Erki Eessaar Ethan Mertz Etienne LAFARGE Etsuro Fujita Euler Taveira Evan Si Evgeniy Gorbanev Fabio R. Sluzala Fabrízio de Royes Mello Feike Steenbergen Feliphe Pozzer Felix Fire Emerald Florents Tselai Francesco Degrassi Frank Streitzig Fredrik Widlert Frédéric Yhuel Fujii Masao Gabriele Bartolini Gavin Panella Geoff Winkless George MacKerron Gilles Darold Grant Gryczan Greg Burd Greg Sabino Mullane Greg Stark Grigory Kryachko Guillaume Lelarge Gunnar Morling Gurjeet Singh Hacking Discord Haifang Wang Hajime Matsunaga Hamid Akhtar Hannu Krosing Hari Krishna Sunder Haruka Takatsuka Hayato Kuroda Heikki Linnakangas Hironobu Suzuki Holger Jakobs Hou Zhijie Hubert Lubaczewski Hugo DUBOIS Hugo Zhang Hunaid Sohail Hywel Carver Ian Barwick Ibrar Ahmed Igor Gnatyuk Igor Korot Ilia Evdokimov Ilya Gladyshev Ilyasov Ian Imran Zaheer Isaac Morland Israel Barth Rubio Ivan Kush Jacob Brazeal Jacob Champion Jaime Casanova Jakob Egger Jakub Wartak James Coleman James Hunter Jan Behrens Japin Li Jason Smith Jayesh Dehankar Jeevan Chalke Jeff Davis Jehan-Guillaume de Rorthais Jelte Fennema-Nio Jian He Jianghua Yang Jiao Shuntian Jim Jones Jim Nasby Jingtang Zhang Jingzhou Fu Joe Conway Joel Jacobson John Hutchins John Naylor Jonathan Katz Josef Šimánek Joseph Koshakow José Villanova Julien Rouhaud Junwang Zhao Justin Pryzby Kaido Vaikla Kaimeh Karina Litskevich Karthik S Kartyshov Ivan Kashif Zeeshan Katsuragi Yuta Keisuke Kuroda Kevin Hale Boyes Kevin K Biju Kirill Reshke Kirill Zdornyy Koen De Groote Koichi Suzuki Koki Nakamura Konstantin Knizhnik Kuntal Ghosh Kuntal G
Re: issue with synchronized_standby_slots
On Tue, Sep 9, 2025 at 2:39 PM Zhijie Hou (Fujitsu) wrote: > I agree. For synchronized_standby_slots, I think it is acceptable to report > only > parsing errors, because slots could be dropped even after validating the slot > existence during GUC loading. Additionally, we would report WARNINGs for > non-existent slots during the wait function anyway (e.g., in > StandbySlotsHaveCaughtup()). +1, to also address the issue I reported upthread. As I understand it, only the walsender and the slot sync worker actually use synchronized_standby_slots. But in the current code, all processes check for slot existence via the GUC check hook, which wastes cycles. The proposed change would eliminate that overhead. Regards, -- Fujii Masao
Re: [PATCH] jit: fix build with LLVM-21
On 2025-09-12 08:36, Peter Eisentraut wrote: On 08.09.25 15:20, Holger Hoffstätte wrote: I tried building against LLVM-21 and noticed that a function for symbol lookup was renamed (without semantic changes), breaking the LLVM JIT. The following patch fixes this by adding a version guard. It applies equally to both master and 17.6. Passes the test suite and verified on 17.6 with the jit example from the documentation. I can confirm that this change seems correct. See [0] for reference. Excellent! Thanks for taking a look. This was my first post to pgsql-hackers and I wasn't sure if I had done something wrong. [0]: https://github.com/llvm/llvm-project/commit/d3d856ad84698fa4ec66177d00558b2f5b438d3b As a small style request, I would flip the conditional around so that the new code appears first. I see that we don't do this very consistently in the existing code, but maybe we can start a new trend. ;-) I knew this would come up since I pondered the same thing :D As you said, the existing code is not consistent, but I can switch this to let the new code appear first, if you prefer. A new patch is attached. In my testing with LLVM 21, I'm getting an additional error: ../src/backend/jit/llvm/llvmjit_wrap.cpp:56:18: error: no matching constructor for initialization of 'llvm::orc::RTDyldObjectLinkingLayer' 56 | return wrap(new llvm::orc::RTDyldObjectLinkingLayer( | ^ 57 | *unwrap(ES), [] { return std::make_unique(nullptr, true); })); | ~ /opt/homebrew/Cellar/llvm/21.1.0/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:58:3: note: candidate constructor not viable: no known conversion from '(lambda at ../src/backend/jit/llvm/llvmjit_wrap.cpp:57:16)' to 'GetMemoryManagerFunction' (aka 'unique_function (const MemoryBuffer &)>') for 2nd argument 58 | RTDyldObjectLinkingLayer(ExecutionSession &ES, | ^ 59 | GetMemoryManagerFunction GetMemoryManager); | ~ /opt/homebrew/Cellar/llvm/21.1.0/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:37:16: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided 37 | class LLVM_ABI RTDyldObjectLinkingLayer | ^~~~ I gather you're not seeing that? I do not - I dropped the patch into my Gentoo package build for 17.6 and it has been working fine when building with gcc-15.2 and the jit part with clang-21.1.0. I also just rebuilt everything with clang-21.1.1 (no gcc) and also do not see the problem. This is on amd64 Linux. The following commit seems to be involved: https://github.com/postgres/postgres/commit/9044fc1d45a0212fd123bd8f364eac058f60fed7 which is also in my 17.6 tree. I verified that the new SectionMemoryManager.cpp is compiled without error. Since you're using homebrew I guess this is triggered by the __aarch64__ guard in src/include/jit/llvmjit_backport.h. Unfortunately this is as far as I can help with this. Please let me know if there is anything else I can do. cheers Holger >From 699a5aa8f9438c902ac5ad063525193ddf0ebb59 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Holger=20Hoffst=C3=A4tte?= Date: Fri, 12 Sep 2025 09:40:31 +0200 Subject: [PATCH] jit: fix build with LLVM-21 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit LLVM-21 renamed llvm::GlobalValue::getGUID() to getGUIDAssumingExternalLinkage(), so add a version guard. Signed-off-by: Holger Hoffstätte --- src/backend/jit/llvm/llvmjit_inline.cpp | 4 1 file changed, 4 insertions(+) diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp index 2764c3bbe2f..51b32cd9f94 100644 --- a/src/backend/jit/llvm/llvmjit_inline.cpp +++ b/src/backend/jit/llvm/llvmjit_inline.cpp @@ -238,7 +238,11 @@ llvm_build_inline_plan(LLVMContextRef lc, llvm::Module *mod) llvm_split_symbol_name(symbolName.data(), &cmodname, &cfuncname); +#if LLVM_VERSION_MAJOR >= 21 + funcGUID = llvm::GlobalValue::getGUIDAssumingExternalLinkage(cfuncname); +#else funcGUID = llvm::GlobalValue::getGUID(cfuncname); +#endif /* already processed */ if (inlineState.processed) -- 2.51.0
Re: Conflict detection for update_deleted in logical replication
On Fri, Sep 12, 2025 at 8:55 AM Zhijie Hou (Fujitsu) wrote: > > I agree. Here is a V73 patch that will restart the worker if the retention > resumes. I also addressed other comments posted by Amit[1]. > Few comments: * In adjust_xid_advance_interval(), for the case when retention is not active, we still cap the interval by wal_receiver_status_interval. Is that required or do we let it go till 3 minutes? We can add a new else if loop to keep the code clear, if you agree with this. * + /* + * Resume retention immediately if required. (See + * should_resume_retention_immediately() for details). + */ + if (should_resume_retention_immediately(rdt_data, status_received)) + rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION; Is this optimization for the case when we are in stop_phase or after stop_phase and someone has changed maxretention to 0? If so, I don't think it is worth optimizing for such a rare case at the cost of making code difficult to understand. Apart from this, I have changed a few comments in the attached. -- With Regards, Amit Kapila. diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 1274192de7e..7d1ed506622 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -4363,10 +4363,7 @@ maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data, if (!can_advance_nonremovable_xid(rdt_data)) return; - /* -* Resume retention immediately if required. (See -* should_resume_retention_immediately() for details). -*/ + /* Resume dead_tuples retention immediately if required. */ if (should_resume_retention_immediately(rdt_data, status_received)) rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION; @@ -4549,8 +4546,8 @@ wait_for_publisher_status(RetainDeadTuplesData *rdt_data, { /* * Stop retention if not yet. Otherwise, reset to the initial phase to -* retry all phases. This is required to recalculate the current wait -* time and resume retention if the time falls within +* retry resuming retention. This reset is required to recalculate the +* current wait time and resume retention if the time falls within * max_retention_duration. */ if (MySubscription->retentionactive) @@ -4683,8 +4680,8 @@ wait_for_local_flush(RetainDeadTuplesData *rdt_data) { /* * Stop retention if not yet. Otherwise, reset to the initial phase to -* retry all phases. This is required to recalculate the current wait -* time and resume retention if the time falls within +* retry resuming retention. This reset is required to recalculate the +* current wait time and resume retention if the time falls within * max_retention_duration. */ if (MySubscription->retentionactive) @@ -4763,9 +4760,7 @@ wait_for_local_flush(RetainDeadTuplesData *rdt_data) * Check whether conflict information retention should be stopped due to * exceeding the maximum wait time (max_retention_duration). * - * If retention should be stopped, transition to the - * RDT_STOP_CONFLICT_INFO_RETENTION phase and return true. Otherwise, return - * false. + * If retention should be stopped, return true. Otherwise, return false. */ static bool should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Re: Conflict detection for update_deleted in logical replication
On Fri, Sep 12, 2025 at 8:55 AM Zhijie Hou (Fujitsu) wrote: > > > I agree. Here is a V73 patch that will restart the worker if the retention > resumes. I also addressed other comments posted by Amit[1]. > Thanks for the patch. Few comments: 1) There is a small window where worker can exit while resuming retention and launcher can end up acessign stale worker info. Lets say launcher is at a stage where it has fetched worker: w = logicalrep_worker_find(sub->oid, InvalidOid, false); And after this point, before the launcher could do compute_min_nonremovable_xid(), the worker has stopped retention and resumed as well. Now the worker has exited but in compute_min_nonremovable_xid(), launcher will still use the worker-info fetched previously. 2) if (should_stop_conflict_info_retention(rdt_data)) + { +/* + * Stop retention if not yet. Otherwise, reset to the initial phase to + * retry all phases. This is required to recalculate the current wait + * time and resume retention if the time falls within + * max_retention_duration. + */ +if (MySubscription->retentionactive) + rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION; +else + reset_retention_data_fields(rdt_data); + return; + } Shall we have an Assert( !MyLogicalRepWorker->oldest_nonremovable_xid) in 'else' part above? thanks Shveta
Re: Add support for entry counting in pgstats
> On Sep 12, 2025, at 15:23, Michael Paquier wrote: > > > -- > Michael > <0001-Add-support-for-entry-counting-in-pgstats.patch><0002-injection_points-Add-entry-counting.patch> The code overall looks good to me, very clear and neat. Just a few small comments: 1 - 0001 ``` +* set. This counter is incremented each time a new entry is created (not +* when reused), and each time an entry is dropped. +*/ + pg_atomic_uint64 entry_counts[PGSTAT_KIND_MAX]; ``` “and each time an entry is dropped” is a little misleading, it should be “and decremented when an entry is removed”. 2 - 0001 ``` + /* Increment entry count, if required. */ + if (kind_info->track_counts) + pg_atomic_fetch_add_u64(&pgStatLocal.shmem->entry_counts[kind - 1], 1); ``` The code is quite straightforward, so the comment seems unnecessary. 3 - 0001 ``` + if (kind_info->track_counts) + { + ereport(ERROR, + (errmsg("custom cumulative statistics property is invalid"), +errhint("Custom cumulative statistics cannot use counter tracking for fixed-numbered objects."))); + } ``` {} are not needed. Looks like in the existing code, when “if” has a single statement, {} are not used. There is a similar “if” right above this change. 4 - 0004 ``` diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c index e3947b23ba57..15ad9883dedb 100644 --- a/src/test/modules/injection_points/injection_stats.c +++ b/src/test/modules/injection_points/injection_stats.c @@ -49,6 +49,7 @@ static const PgStat_KindInfo injection_stats = { .shared_data_len = sizeof(((PgStatShared_InjectionPoint *) 0)->stats), .pending_size = sizeof(PgStat_StatInjEntry), .flush_pending_cb = injection_stats_flush_cb, + .track_counts = true, }; ``` Would it be better to move “.track_counts” up to be with the other boolean flags? Whey define together to share the same byte, feels better to initialize them together as well. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Re: How can end users know the cause of LR slot sync delays?
Hi Amit, On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma wrote: > > Hi Amit, > > On Sat, Sep 6, 2025 at 10:46 AM Amit Kapila wrote: > > > > On Fri, Sep 5, 2025 at 12:50 PM Ashutosh Sharma > > wrote: > > > > > > Good to hear that you’re also interested in working on this task. > > > > > > On Thu, Sep 4, 2025 at 8:26 PM Shlok Kyal > > > wrote: > > >> > > >> Hi Ashutosh, > > >> > > >> I am also interested in this thread. And was working on a patch for it. > > >> > > >> On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma > > >> wrote: > > >> > > > >> > Hi Amit, > > >> > > > >> > On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila > > >> > wrote: > > >> >> > > >> >> On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma > > >> >> wrote: > > >> >> > > > >> >> > We have seen cases where slot synchronization gets delayed, for > > >> >> > example when the slot is behind the failover standby or vice versa, > > >> >> > and the slot sync worker has to wait for one to catch up with the > > >> >> > other. During this waiting period, users querying > > >> >> > pg_replication_slots can only see whether the slot has been > > >> >> > synchronized or not. If it has already synchronized, that’s fine, > > >> >> > but if synchronization is taking longer, users would naturally want > > >> >> > to understand the reason for the delay. > > >> >> > > > >> >> > Is there a way for end users to know the cause of slot > > >> >> > synchronization delays, so they can take appropriate actions to > > >> >> > speed it up? > > >> >> > > > >> >> > I understand that server logs are emitted in such cases, but logs > > >> >> > are not something end users would want to check regularly. > > >> >> > Moreover, since logging is configuration-based, relevant messages > > >> >> > may sometimes be skipped or suppressed. > > >> >> > > > >> >> > > >> >> Currently, the way to see the reason for sync skip is LOGs but I think > > >> >> it is better to add a new column like sync_skip_reason in > > >> >> pg_replication_slots. This can show the reasons like > > >> >> standby_LSN_ahead_remote_LSN. I think ideally users can compare > > >> >> standby's slot LSN/XMIN with remote_slot being synced. Do you have any > > >> >> better ideas? > > >> >> > > >> > > > >> > I have similar thoughts, but for clarity, I’d like to outline some of > > >> > the key steps I plan to take: > > >> > > > >> > Step 1: Define an enum for all possible reasons a slot persistence was > > >> > skipped. > > >> > > > >> > /* > > >> > * Reasons why a replication slot sync was skipped. > > >> > */ > > >> > typedef enum ReplicationSlotSyncSkipReason > > >> > { > > >> > RS_SYNC_SKIP_NONE = 0, /* No skip */ > > >> > > > >> > RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind > > >> > local reserved LSN */ > > >> > > > >> > RS_SYNC_SKIP_DATA_LOSS = (1 << 1), /* Local slot ahead of > > >> > remote, risk of data loss */ > > >> > > > >> > RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2)/* Standby could not build > > >> > a consistent snapshot */ > > >> > } ReplicationSlotSyncSkipReason; > > >> > > > >> > -- > > >> > > > >> I think we should also add the case when "remote_slot->confirmed_lsn > > > >> latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot > > >> is still not flushed on the Standby). In this case as well we are > > >> skipping the slot sync. > > > > > > > > > Yes, we can include this case as well. > > > > > >> > > >> > > >> > Step 2: Introduce new column to pg_replication_slots to store the skip > > >> > reason > > >> > > > >> > /* Inside pg_replication_slots table */ > > >> > ReplicationSlotSyncSkipReason slot_sync_skip_reason; > > >> > > > >> > -- > > >> > > > >> As per the discussion [1], I think it is more of stat related data and > > >> we should add it in the pg_stat_replication_slots view. Also we can > > >> add columns for 'slot sync skip count' and 'last slot sync skip'. > > >> Thoughts? > > > > > > > > > It’s not a bad choice, but what makes it a bit confusing for me is that > > > some of the slot sync information is stored in pg_replication_slots, > > > while some is in pg_stat_replication_slots. > > > > > > > How about keeping sync_skip_reason in pg_replication_slots and > > sync_skip_count in pg_stat_replication_slots? > > > > I think we can do that, since sync_skip_reason appears to be a > descriptive metadata rather than statistical data like > slot_sync_skip_count and last_slot_sync_skip. However, it's also true > that all three pieces of data are transient by nature - they will just > be present in the runtime. > After spending some more time on this, I found that maintaining sync_skip_reason in pg_replication_slots would make the code changes a bit messy and harder to maintain. I think storing all 3 pieces of information - sync_skip_reason, sync_skip_count, and last_sync_skip in pg_stat_replication_slots would be a cleaner solution. This way, all the sync-related info stays together and the code remain
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
On Fri, Sep 12, 2025 at 9:38 AM Amit Kapila wrote: > > > One thing related to this which needs a discussion is after this > change, it is possible that part of the transaction contains > additional logical_wal_info. I couldn't think of a problem due to this > but users using pg_waldump or other WAL reading utilities could > question this. One possibility is that we always start including > logical_wal_info for the next new transaction but not sure if that is > required. It would be good if other people involved in the discussion > or otherwise could share their opinion on this point. > AFAIR, logical info is a separate section in a WAL record, and there is not marker which says "WAL will contain logical info henceforth". So the utilities should be checking for the existence of such info before reading it. So I guess it should be ok. Some extra sensitive utilities may expect that once a WAL record has logical info, all the succeeding WAL records will have it. They may find it troublesome that WAL records with and without logical info are interleaved. Generally, I would prefer that presence/absence of logical info changes at transaction boundaries, but we will still have interleaving WAL records. So I doubt how much that matters. Sorry for jumping late in the discussion. I have a few comments, mostly superficial ones. I am yet to take a deeper look at the synchronization logic. @@ -328,8 +362,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). Existing logical slots on standby also get invalidated if - wal_level on the primary is reduced to less than - logical. + logical decoding becomes disabled on the primary. s/becomes disabled/is disabled/ or /gets disabled/. Given that logical decoding can be disabled in multiple ways, it's better to add a reference here to a section which explains what disabling logical decoding means. wal_level_insufficient means that the - primary doesn't have a sufficient to - perform logical decoding. It is set only for logical slots. + logical decoding is disabled on primary due to insufficient + or no logical slots. It is set only + for logical slots. It may not be apparent to the users that insufficient wal_level means 'minimal' here. It will be better if we just mention logical decoding is disabled on primary and refer to a section which explains what disabling logical decoding means. * * Skip this if we're taking a full-page image of the new page, as we * don't include the new tuple in the WAL record in that case. Also - * disable if wal_level='logical', as logical decoding needs to be able to - * read the new tuple in whole from the WAL record alone. + * disable if logical decoding is enabled, as logical decoding needs to be + * able to read the new tuple in whole from the WAL record alone. */ Not fault of this patch, but I find the comment to be slighly out of sync with the code. The code actually checks whether the logical decoding is enabled and whether the relation requires WAL to be logged for logical decoding. The difference is subtle, but it did cause me a bit of confusion when I read the code. Please consider rephrasing the comment while you are modifying it. if (oldbuf == newbuf && !need_tuple_data && !XLogCheckBufferNeedsBackup(newbuf)) @@ -9057,8 +9057,8 @@ log_heap_update(Relation reln, Buffer oldbuf, /* * Perform XLogInsert of an XLOG_HEAP2_NEW_CID record * - * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for catalog - * tuples. + * This is only used when effective WAL level is logical, and only for Given that the earlier comment used GUC name, it's better to use the GUC name "effective_wal_level" here. case XLOG_PARAMETER_CHANGE: + + /* + * Even if wal_level on the primary got decreased to 'replica' it + * doesn't necessarily mean to disable the logical decoding as + * long as we have at least one logical slot. So we don't check + * the logical decoding availability here but do in + * XLOG_LOGICAL_DECODING_STATUS_CHANGE case. + */ The earlier code checked for wal_level < WAL_LEVEL_LOGICAL, which includes the case when wal_level is 'minimal'. I don't see the new code handling 'minimal' case here. Am I missing something? Do we need a comment here which specifically mentions "minimal" case grammar "Even if wal_level on the primary was lowered to 'replica', as long as there is at least one logical slot, the logical decoding remains enabled. ... " also ... do so in ... . + * The module maintains separate controls of two aspects: writing information + * required by logical decoding to WAL records and utilizing logical decoding + * itself, controlled by LogicalDecodingCtl->xlog_logical_info and + * ->logical_decoding_enabled fields respectively. The activation process of + * logical decoding involves several steps, beginning with maintaining logical + * decoding in a disabled stat
let ALTER COLUMN SET DATA TYPE cope with POLICY dependency
hi. in [1], RememberAllDependentForRebuilding /* * A policy can depend on a column because the column is * specified in the policy's USING or WITH CHECK qual * expressions. It might be possible to rewrite and recheck * the policy expression, but punt for now. It's certainly * easy enough to remove and recreate the policy; still, FIXME * someday. */ After 11 year, I am trying to allow column type changes to cope with security policy dependencies. CREATE TABLE s (a int, b int); CREATE POLICY p2 ON s USING (s.b = 1); --master branch will result error ALTER TABLE s ALTER COLUMN b SET DATA TYPE INT8; ERROR: cannot alter type of a column used in a policy definition DETAIL: policy p2 on table s depends on column "b" with the attached patch, ALTER TABLE SET DATA TYPE can cope with columns that have associated security policy. The above ALTER TABLE SET DATA TYPE will just work fine. The code roughly follows how statistics are recreated after a column data type change. Currently table rewrite does not recheck the policy expression, for example: RESET SESSION AUTHORIZATION; CREATE USER regress_rls_alice NOLOGIN; GRANT ALL ON SCHEMA public to public; DROP TABLE IF EXISTS R1; SET row_security = on; begin; set role regress_rls_alice; CREATE TABLE r1 (a int, b int GENERATED ALWAYS AS (a * 10) STORED); INSERT INTO r1 VALUES (1), (2), (4); CREATE POLICY p0 ON r1 USING (true); CREATE POLICY p1 ON r1 AS RESTRICTIVE USING (b > 10); ALTER TABLE r1 ENABLE ROW LEVEL SECURITY; ALTER TABLE r1 FORCE ROW LEVEL SECURITY; commit; set role regress_rls_alice; INSERT INTO r1 VALUES (0); -- Should fail p1 ALTER TABLE r1 ALTER COLUMN b SET EXPRESSION AS (-1); --OK so i guess ALTER TABLE SET DATA TYPE, table rewrite no checking policy should be fine? [1] https://git.postgresql.org/cgit/postgresql.git/commit/?id=143b39c1855f8a22f474f20354ee5ee5d2f4d266 From be887b714a3bc788b6859954fad63137d64d1f61 Mon Sep 17 00:00:00 2001 From: jian he Date: Fri, 12 Sep 2025 16:06:53 +0800 Subject: [PATCH v1 1/1] let ALTER COLUMN SET DATA TYPE cope with POLICY dependency CREATE TABLE s (a int, b int); CREATE POLICY p2 ON s USING (s.b = 1); --no-error, while master branch will result error ALTER TABLE s ALTER COLUMN b SET DATA TYPE INT8; discussion: https://postgr.es/m/ --- src/backend/commands/policy.c | 59 ++ src/backend/commands/tablecmds.c | 122 ++-- src/backend/parser/gram.y | 1 + src/backend/utils/adt/ruleutils.c | 185 ++ src/include/commands/policy.h | 1 + src/include/nodes/parsenodes.h| 2 + src/include/utils/ruleutils.h | 1 + .../test_ddl_deparse/test_ddl_deparse.c | 3 + src/test/regress/expected/rowsecurity.out | 79 src/test/regress/sql/rowsecurity.sql | 41 10 files changed, 479 insertions(+), 15 deletions(-) diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index 83056960fe4..c287c08f155 100644 --- a/src/backend/commands/policy.c +++ b/src/backend/commands/policy.c @@ -24,8 +24,10 @@ #include "catalog/namespace.h" #include "catalog/objectaccess.h" #include "catalog/pg_authid.h" +#include "catalog/pg_depend.h" #include "catalog/pg_policy.h" #include "catalog/pg_type.h" +#include "commands/comment.h" #include "commands/policy.h" #include "miscadmin.h" #include "nodes/pg_list.h" @@ -755,6 +757,11 @@ CreatePolicy(CreatePolicyStmt *stmt) relation_close(target_table, NoLock); table_close(pg_policy_rel, RowExclusiveLock); + /* Add any requested comment */ + if (stmt->polcomment != NULL) + CreateComments(policy_id, PolicyRelationId, 0, + stmt->polcomment); + return myself; } @@ -1277,3 +1284,55 @@ relation_has_policies(Relation rel) return ret; } + +/* + * PoliciesGetRelations - + * Collect all relations that this policy depends on. + * Since the policy's check qual or qual may reference other relations, we + * include those as well. + */ +List * +PoliciesGetRelations(Oid policyId) +{ + List *result = NIL; + Relation depRel; + ScanKeyData key[2]; + SysScanDesc depScan; + HeapTuple depTup; + + /* + * We scan pg_depend to find those things that policy being depended on. + */ + depRel = table_open(DependRelationId, AccessShareLock); + + ScanKeyInit(&key[0], +Anum_pg_depend_classid, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(PolicyRelationId)); + ScanKeyInit(&key[1], +Anum_pg_depend_objid, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(policyId)); + + depScan = systable_beginscan(depRel, DependDependerIndexId, true, + NULL, 2, key); + while (HeapTupleIsValid(depTup = systable_getnext(depScan))) + { + Form_pg_depend pg_depend = (Form_pg_depend) GETSTRUCT(depTup); + + /* Prepend oid of the relation with this policy.
Re: issue with synchronized_standby_slots
On Thu, 11 Sept 2025 at 12:00, Rahila Syed wrote: > > Hi, > > >> >> >> BTW, we should also try to conclude on my yesterday's point as to why >> it is okay to have the same behavior for default_tablespace and >> default_table_access_method and not for this parameter? I am asking >> because if we change the current behavior, tomorrow, we can get >> complaints that one expects the old behaviour as that was similar to >> other GUCs like default_tablespace and default_table_access_method. >> > > Fair point. I haven't examined the validation of GUCs in parallel workers > closely, > but one argument for preventing parallel workers from failing due to an > incorrect > value of synchronized_standby_slots is that a select query works in this > situation > without parallel workers. > > Whereas, for incorrect values of default_tablespace and > default_table_access_method, > most commands would fail regardless of whether parallel workers are enabled. > > PFA a test for the original bug report on this thread. This applies on the > v3 version of the patch > that was shared. > Thanks for sharing the patch. I checked the test and it looks good to me. But I am not sure if we should have a new file for the test. I have added the test in the '040_standby_failover_slots_sync.pl' file along with some other tests. Also I have addressed the comments by Ashutosh in [1][2]. I have attached the updated v4 patch [1]: https://www.postgresql.org/message-id/CAE9k0P%3Dx3J3nmSmYKmTkiFXTDKLxJkXFO4%2BVHJyNu01Od6CZfg%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CAE9k0P%3DOFMFCRy9aDGWZ3bt91tbB1WnzsAbzXN72iWBaGVuMrw%40mail.gmail.com Thanks, Shlok Kyal v4-0001-Remove-the-validation-from-the-GUC-check-hook-and.patch Description: Binary data
Re: let ALTER TABLE DROP COLUMN drop whole-row referenced object
On Thu, Sep 11, 2025 at 9:27 AM Chao Li wrote: > > But v2 needs a rebase, I cannot apply it to master. > hi. please check attached v3-0001, v3-0002. v3-0001: ALTER TABLE DROP COLUMN cascade to drop any whole-row referenced constraints and indexes. v3-0002: ALTER COLUMN SET DATA TYPE error out when whole-row referenced constraint exists with the v3-0002 patch, CREATE TABLE ts (a int, c int, b int constraint cc check((ts = ROW(1,1,1; ALTER TABLE ts ALTER COLUMN b SET DATA TYPE INT8; ERROR: cannot alter table column "ts"."b" to type bigint because constraint "cc" uses table "ts" row type HINT: You might need to drop constraint "cc" first to change column "ts"."b" data type Of course, even if we do not error out, regular insert will fail too. insert into ts values(1,1,1); ERROR: cannot compare dissimilar column types bigint and integer at record column 3 src7=# \errverbose ERROR: 42804: cannot compare dissimilar column types bigint and integer at record column 3 LOCATION: record_eq, rowtypes.c:1193 then you need debug to find out the root error cause is constraint cc is not being satisfied; and you still need to handle the corrupted constraint cc afterward. With the v3-0002 patch, ALTER TABLE SET DATA TYPE provides an explicit error message that helps quickly identify the problem. So I guess it should be helpful. index expression/predicate and check constraint expression can not contain subquery, that's why using pull_varattnos to test whole-row containment works fine. but pull_varattnos can not cope with subquery, see pull_varattnos comments. row security policy can have subquery, for example: CREATE POLICY p1 ON document AS PERMISSIVE USING (dlevel <= (SELECT seclv FROM uaccount WHERE pguser = current_user)); so I am still working on whole-row referenced policies interacting with ALTER TABLE SET DATA TYPE/ALTER TABLE DROP COLUMN. From 090a087da9b7fb072acd7e9683faf9ba2b5c76af Mon Sep 17 00:00:00 2001 From: jian he Date: Fri, 12 Sep 2025 17:10:19 +0800 Subject: [PATCH v3 2/2] disallow ALTER COLUMN SET DATA TYPE when wholerow referenced constraint exists CREATE TABLE ts (a int, c int, b int constraint cc check((ts = ROW(1,1,1; INSERT INTO ts values (1,1,1); ALTER TABLE ts ALTER COLUMN b SET DATA TYPE INT8; ERROR: cannot alter table column "ts"."b" to type bigint because constraint "cc" uses table "ts" row type HINT: You might need to drop constraint "cc" first to change column "ts"."b" data type No need to worry about the index; index rebuild will fail automatically. discussion: https://postgr.es/m/cacjufxga6kvqy7dbhglvw9s9kkmpgyzt5me6c7kefjdpr2w...@mail.gmail.com --- src/backend/commands/tablecmds.c | 40 +++ src/test/regress/expected/constraints.out | 11 +++ src/test/regress/sql/constraints.sql | 10 ++ 3 files changed, 61 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 409bc65e53f..62c84726c4e 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -14664,6 +14664,46 @@ ATPrepAlterColumnType(List **wqueue, find_composite_type_dependencies(rel->rd_rel->reltype, rel, NULL); } + /* + * If the table has a whole-row referenced CHECK constraint, then changing + * any column data type is not allowed. + */ + if (targettype != attTup->atttypid || targettypmod != attTup->atttypmod) + { + TupleConstr *constr = RelationGetDescr(rel)->constr; + + if (constr && constr->num_check > 0) + { + ConstrCheck *check = constr->check; + Node *check_expr; + + for (int i = 0; i < constr->num_check; i++) + { +Bitmapset *expr_attrs = NULL; + +char *constr_name = check[i].ccname; +check_expr = stringToNode(check[i].ccbin); + +/* Find all attributes referenced */ +pull_varattnos(check_expr, 1, &expr_attrs); + +if (bms_is_member(0 - FirstLowInvalidHeapAttributeNumber, expr_attrs)) + ereport(ERROR, + errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot alter table column \"%s\".\"%s\" to type %s because constraint \"%s\" uses table \"%s\" row type", + RelationGetRelationName(rel), + colName, + format_type_with_typemod(targettype, targettypmod), + constr_name, + RelationGetRelationName(rel)), + errhint("You might need to drop constraint \"%s\" first to change column \"%s\".\"%s\" data type", + constr_name, + RelationGetRelationName(rel), + colName)); + } + } + } + ReleaseSysCache(tuple); /* diff --git a/src/test/regress/expected/constraints.out b/src/test/regress/expected/constraints.out index ce2fb02971f..6de0cc4ec33 100644 --- a/src/test/regress/expected/constraints.out +++ b/src/test/regress/expected/constraints.out @@ -271,6 +271,17 @@ ALTER TABLE DROP_COL_CHECK_TBL DROP COLUMN city; DROP TABLE DROP_COL_CHECK_TBL; -- +-- Change column data type should error out because the whole-row reference
RE: Conflict detection for update_deleted in logical replication
On Friday, September 12, 2025 4:48 PM shveta malik wrote: > On Fri, Sep 12, 2025 at 8:55 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > I agree. Here is a V73 patch that will restart the worker if the > > retention resumes. I also addressed other comments posted by Amit[1]. > > > > Thanks for the patch. Few comments: Thanks for the comments! > > 1) > There is a small window where worker can exit while resuming retention and > launcher can end up acessign stale worker info. > > Lets say launcher is at a stage where it has fetched worker: > w = logicalrep_worker_find(sub->oid, InvalidOid, false); > > And after this point, before the launcher could do > compute_min_nonremovable_xid(), the worker has stopped retention and > resumed as well. Now the worker has exited but in > compute_min_nonremovable_xid(), launcher will still use the worker-info > fetched previously. Thanks for catching this, I have fixed by computing the xid under LogicalRepWorkerLock. > > 2) > > if (should_stop_conflict_info_retention(rdt_data)) > + { > +/* > + * Stop retention if not yet. Otherwise, reset to the initial phase > +to > + * retry all phases. This is required to recalculate the current > +wait > + * time and resume retention if the time falls within > + * max_retention_duration. > + */ > +if (MySubscription->retentionactive) > + rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION; > +else > + reset_retention_data_fields(rdt_data); > + > return; > + } > > > > Shall we have an Assert( !MyLogicalRepWorker->oldest_nonremovable_xid) > in 'else' part above? Added. Here is the V74 patch which addressed all comments. Best Regards, Hou zj v74-0001-Allow-conflict-relevant-data-retention-to-resume.patch Description: v74-0001-Allow-conflict-relevant-data-retention-to-resume.patch
Re: How can end users know the cause of LR slot sync delays?
On Fri, Sep 12, 2025 at 1:07 PM Ashutosh Sharma wrote: > > On Mon, Sep 8, 2025 at 9:52 AM Ashutosh Sharma wrote: > > > > I think we can do that, since sync_skip_reason appears to be a > > descriptive metadata rather than statistical data like > > slot_sync_skip_count and last_slot_sync_skip. However, it's also true > > that all three pieces of data are transient by nature - they will just > > be present in the runtime. > > > > After spending some more time on this, I found that maintaining > sync_skip_reason in pg_replication_slots would make the code changes a > bit messy and harder to maintain. > What exactly is your worry? It seems more logical to have sync_skip_reason in pg_replication_slots and other two in pg_stat_replication_slots as the latter is purely a stats view and the sync_skip_count/last_sync_skip suits there better. I think storing all 3 pieces of > information - sync_skip_reason, sync_skip_count, and last_sync_skip in > pg_stat_replication_slots would be a cleaner solution. This way, all > the sync-related info stays together and the code remains > straightforward. > Having all the sync information together has merit but don't you think in this case the sync_skip_reason doesn't seem to be matching with the existing columns in pg_stat_replication_slots? -- With Regards, Amit Kapila.
Re: pg_waldump: support decoding of WAL inside tarfile
On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak wrote: > > On Tue, Aug 26, 2025 at 1:53 PM Amul Sul wrote: > > > [..patch] > > Hi Amul! > Thanks for your review. I'm replying to a few of your comments now, but for the rest, I need to think about them. I'm kind of in agreement with some of them for the fix, but I won't be able to spend time on that next week due to official travel. I'll try to get back as soon as possible after that. > a. Why should it be necessary to provide startLSN (-s) ? Couldn't > it autodetect the first WAL (tar file) inside and just use that with > some info message? > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar > pg_waldump: error: no start WAL location given > There are two reasons. First, existing pg_waldump --path=some_directory would result in the same error. Second, it would force us to re-read the archive twice just to locate the first WAL segment, which is inefficient. > c. It doesnt work when using SEGSTART, but it's there: > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar > 00010059 > pg_waldump: error: could not open file "00010059": > Not a directory > $ tar tf /tmp/base/pg_wal.tar | head -1 > 00010059 > I don't believe this is the correct use case. The WAL files are inside a tar archive, and the requirement is to use a starting LSN and a timeline (if not the default). > d. I've later noticed that follow-up patches seem to use the > -s switch and there it seems to work OK. The above SEGSTART issue was > not detected, probably because tests need to be extended cover of > segment name rather than just --start LSN (see test_pg_waldump): > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats > -s 0/59000358 > pg_waldump: first record is after 0/59000358, at 0/590003E8, > skipping over 144 bytes > WAL statistics between 0/590003E8 and 0/6100: > [..] > Hope previous reasoning makes sense to you. > e. Code around`if (walpath == NULL && directory != NULL)` needs > some comments. > I think this is an existing one. > f. Code around `if (fname != NULL && is_tar_file(fname, > &compression))` , so if fname is WAL segment here > (0001005A) and we do check again if that has been > tar-ed (is_tar_file())? Why? > Again, how? > g. Just a question: the commit message says `Note that this patch > requires that the WAL files within the archive be in sequential order; > an error will be reported otherwise`. I'm wondering if such > occurrences are known to be happening in the wild? Or is it just an > assumption that if someone would modify the tar somehow? (either way > we could just add a reason why we need to handle such a case if we > know -- is manual alternation the only source of such state?). For the > record, I've tested crafting custom archives with out of sequence WAL > archives and the code seems to work (it was done using: tar --append > -f pg_wal.tar --format=ustar ..) > This is an almost nonexistent occurrence. While pg_basebackup archives WAL files in sequential order, we don't have an explicit code to enforce that order within it. Furthermore, since we can't control how external tools might handle the files, this extra precaution is necessary. > Another open question I have is this: shouldn't backup_manifest come > with CRC checksum for the archived WALs? Or does that guarantee that > backup_manifest WAL-Ranges are present in pg_wal.tar is good enough > because individual WAL files are CRC-protected itself? > I don't know, I have to check pg_verifybackup. Regards, Amul
Re: Error with DEFAULT VALUE in temp table
Hi, Tom! Thank you for working on this. I see you've fixed the patch and committed it as a0b99fc1220. I tested it a bit and see some side effects which may be unintentional. 1. SCHEMA lost object_name. Before: postgres=# create schema foo; CREATE SCHEMA postgres=# drop schema foo; DROP SCHEMA postgres=# select * from dropped_objects \gx -[ RECORD 1 ]---+--- n | 1 classid | 2615 objid | 16404 objsubid| 0 original| t normal | f is_temporary| f object_type | schema schema_name | object_name | foo object_identity | foo address_names | {foo} address_args| {} After: postgres=# select * from dropped_objects \gx -[ RECORD 1 ]---+--- n | 1 classid | 2615 objid | 16394 objsubid| 0 original| t normal | f is_temporary| f object_type | schema schema_name | object_name | object_identity | foo address_names | {foo} address_args| {} 2. DEFAULT VALUE now has schema_name and object_name. Before: postgres=# create temp table bar (a int default 0); CREATE TABLE postgres=# drop table bar; DROP TABLE postgres=# select * from dropped_objects where object_type = 'default value' \gx -[ RECORD 1 ]---+-- n | 4 classid | 2604 objid | 16422 objsubid| 0 original| f normal | f is_temporary| f object_type | default value schema_name | object_name | object_identity | for pg_temp.bar.a address_names | {pg_temp,bar,a} address_args| {} After: postgres=# select * from dropped_objects where object_type = 'default value' \gx -[ RECORD 1 ]---+-- n | 4 classid | 2604 objid | 16430 objsubid| 0 original| f normal | f is_temporary| t object_type | default value schema_name | pg_temp object_name | bar object_identity | for pg_temp.bar.a address_names | {pg_temp,bar,a} address_args| {} This may be intentional, but doesn't quite match the description for object_name in the docs: Name of the object, if the combination of schema and name can be used as a unique identifier for the object; otherwise NULL. Also it doesn't match with the record for the column itself: postgres=# create temp table bar (a int default 0); CREATE TABLE postgres=# alter table bar drop column a; ALTER TABLE postgres=# select * from dropped_objects \gx -[ RECORD 1 ]---+-- n | 1 classid | 1259 objid | 16435 objsubid| 1 original| t normal | f is_temporary| t object_type | table column schema_name | pg_temp object_name | object_identity | pg_temp.bar.a address_names | {pg_temp,bar,a} address_args| {} -[ RECORD 2 ]---+-- n | 2 classid | 2604 objid | 16438 objsubid| 0 original| f normal | f is_temporary| t object_type | default value schema_name | pg_temp object_name | bar object_identity | for pg_temp.bar.a address_names | {pg_temp,bar,a} address_args| {} object_name is null for the table column, but not null for its default value. As for schema_name, I'm not sure whether it should be null or not. Currently schema_name is null for triggers and policy objects, but that may be accidental. Best regards, -- Sergey Shinderukhttps://postgrespro.com/
Re: New recovery_target_timeline=primary option
Even the documentation states/warns: "Set restore_command to a simple command to copy files from the WAL archive. If you plan to have multiple standby servers for high availability purposes, make sure that recovery_target_timeline is set to latest (the default), to make the standby server follow the timeline change that occurs at failover to another standby." By default, recovery_target_timeline is set to latest. What I'm recommending is an option to set it to just follow or stay within the primarie's timeline without having to receive the fatal message stated before that ends up stopping the recovery of the standby. Supposed we have timelines 1-3 archived in our backup repo. Currently our streaming replication setup is running in timeline 3. But now, we need to restore the primary to timeline 2. We can specify recovery_target_timeline=2 to initially restore the primary. But when I go to reinit or rebuild the standby, why not just add a new option, recovery_target_timeline=primary, that forces the standby to just stay on the primaries timeline without having to figure out the correct timeline for the standby. Without this new parameter or without specifying the timeline when restoring the standby, the restore will take the standby to timeline 3 and get the fatal error message. This happens a lot on setups using tools like patroni. Just trying to make the administrator's and HA tools lives a little easier when setting up a standby. Yahoo Mail: Search, Organize, Conquer On Thu, Sep 11, 2025 at 9:19 PM, Euler Taveira wrote: On Thu, Sep 11, 2025, at 10:07 PM, Efrain J. Berdecia wrote: > The error I would like to address with this feature is the following: > > FATAL: highest timeline xxx of the primary is behind timeline yyy > It seems your procedure to set up a standby is incorrect. See [1]. You are not using the base backup from the primary server. You didn't describe the whole procedure so it is hard to point out where the problem is. [1] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP -- Euler Taveira EDB https://www.enterprisedb.com/
Preferred use of macro GetPGProcByNumber
Hello hackers, I've noticed some places where elements of ProcGlobal->allProcs are addressed directly via arr index. But in proc.h macros GetPGProcByNumber exist, that was added to get proc objects by index, so I suggest minor refactoring 'ProcGlobal->allProcs[index]' -> 'GetPGProcByNumber(index)'. Please, see attached patch, branched from rev 6ede13d1b5f. Best regards Melnikov Maksim From f03b596e8fd3fa80f86843babb1704fae4032480 Mon Sep 17 00:00:00 2001 From: Maksim Melnikov Date: Fri, 12 Sep 2025 15:17:39 +0300 Subject: [PATCH v1] Preferred use of macro GetPGProcByNumber. Minor refactoring, using of GetPGProcByNumber(index) macros instead of ProcGlobal->allProcs[index]. --- src/backend/access/transam/clog.c | 4 +- src/backend/postmaster/pgarch.c| 2 +- src/backend/postmaster/walsummarizer.c | 2 +- src/backend/storage/buffer/freelist.c | 2 +- src/backend/storage/ipc/procarray.c| 64 -- src/backend/storage/lmgr/lock.c| 6 +-- src/backend/storage/lmgr/proc.c| 2 +- 7 files changed, 39 insertions(+), 43 deletions(-) diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index e80fbe109cf..bbdcce39990 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -574,7 +574,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status, /* Walk the list and update the status of all XIDs. */ while (nextidx != INVALID_PROC_NUMBER) { - PGPROC *nextproc = &ProcGlobal->allProcs[nextidx]; + PGPROC *nextproc = GetPGProcByNumber(nextidx); int64 thispageno = nextproc->clogGroupMemberPage; /* @@ -633,7 +633,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status, */ while (wakeidx != INVALID_PROC_NUMBER) { - PGPROC *wakeproc = &ProcGlobal->allProcs[wakeidx]; + PGPROC *wakeproc = GetPGProcByNumber(wakeidx); wakeidx = pg_atomic_read_u32(&wakeproc->clogGroupNext); pg_atomic_write_u32(&wakeproc->clogGroupNext, INVALID_PROC_NUMBER); diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 78e39e5f866..a35895babeb 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -289,7 +289,7 @@ PgArchWakeup(void) * be relaunched shortly and will start archiving. */ if (arch_pgprocno != INVALID_PROC_NUMBER) - SetLatch(&ProcGlobal->allProcs[arch_pgprocno].procLatch); + SetLatch(&GetPGProcByNumber(arch_pgprocno)->procLatch); } diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c index e1f142f20c7..69461ea9a79 100644 --- a/src/backend/postmaster/walsummarizer.c +++ b/src/backend/postmaster/walsummarizer.c @@ -649,7 +649,7 @@ WakeupWalSummarizer(void) LWLockRelease(WALSummarizerLock); if (pgprocno != INVALID_PROC_NUMBER) - SetLatch(&ProcGlobal->allProcs[pgprocno].procLatch); + SetLatch(&GetPGProcByNumber(pgprocno)->procLatch); } /* diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 01909be0272..2fbd254e4bc 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -239,7 +239,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_r * actually fine because procLatch isn't ever freed, so we just can * potentially set the wrong process' (or no process') latch. */ - SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch); + SetLatch(&GetPGProcByNumber(bgwprocno)->procLatch); } /* diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 200f72c6e25..d3b9680b562 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -268,8 +268,6 @@ typedef enum KAXCompressReason static ProcArrayStruct *procArray; -static PGPROC *allProcs; - /* * Cache to reduce overhead of repeated calls to TransactionIdIsInProgress() */ @@ -444,8 +442,6 @@ ProcArrayShmemInit(void) TransamVariables->xactCompletionCount = 1; } - allProcs = ProcGlobal->allProcs; - /* Create or attach to the KnownAssignedXids arrays too, if needed */ if (EnableHotStandby) { @@ -502,7 +498,7 @@ ProcArrayAdd(PGPROC *proc) int this_procno = arrayP->pgprocnos[index]; Assert(this_procno >= 0 && this_procno < (arrayP->maxProcs + NUM_AUXILIARY_PROCS)); - Assert(allProcs[this_procno].pgxactoff == index); + Assert(GetPGProcByNumber(this_procno)->pgxactoff == index); /* If we have found our right position in the array, break */ if (this_procno > pgprocno) @@ -538,9 +534,9 @@ ProcArrayAdd(PGPROC *proc) int procno = arrayP->pgprocnos[index]; Assert(procno >= 0 && procno < (arrayP->maxProcs + NUM_AUXILIARY_PROCS)); - Assert(allProcs[procno].pgxactoff == index - 1); + Assert(GetPGProcByNumber(procno)->pgxactoff == index - 1); - allProcs[procno].pgxactoff = index; + GetPGProcByNumber(procno)->pgxactoff = index; } /* @@ -
Re: Error with DEFAULT VALUE in temp table
On 12.09.2025 14:01, Sergey Shinderuk wrote: object_name is null for the table column, but not null for its default value. As for schema_name, I'm not sure whether it should be null or not. Currently schema_name is null for triggers and policy objects, but that may be accidental. Perhaps "default value" should be like "table constraint", which have schema_name and null object_name. postgres=# create temp table bar (a int not null default 0); CREATE TABLE postgres=# alter table bar drop column a; ALTER TABLE postgres=# select * from dropped_objects \gx -[ RECORD 1 ]---+-- n | 1 classid | 1259 objid | 16445 objsubid| 1 original| t normal | f is_temporary| t object_type | table column schema_name | pg_temp object_name | object_identity | pg_temp.bar.a address_names | {pg_temp,bar,a} address_args| {} -[ RECORD 2 ]---+-- n | 2 classid | 2604 objid | 16448 objsubid| 0 original| f normal | f is_temporary| t object_type | default value schema_name | pg_temp object_name | bar object_identity | for pg_temp.bar.a address_names | {pg_temp,bar,a} address_args| {} -[ RECORD 3 ]---+-- n | 3 classid | 2606 objid | 16449 objsubid| 0 original| f normal | f is_temporary| t object_type | table constraint schema_name | pg_temp object_name | object_identity | bar_a_not_null on pg_temp.bar address_names | {pg_temp,bar,bar_a_not_null} address_args| {} Best regards, -- Sergey Shinderukhttps://postgrespro.com/
Re: [PATCH] jit: fix build with LLVM-21
On 2025-09-12 09:47, Holger Hoffstätte wrote: On 2025-09-12 08:36, Peter Eisentraut wrote: In my testing with LLVM 21, I'm getting an additional error: ../src/backend/jit/llvm/llvmjit_wrap.cpp:56:18: error: no matching constructor for initialization of 'llvm::orc::RTDyldObjectLinkingLayer' 56 | return wrap(new llvm::orc::RTDyldObjectLinkingLayer( | ^ 57 | *unwrap(ES), [] { return std::make_unique(nullptr, true); })); | ~ /opt/homebrew/Cellar/llvm/21.1.0/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:58:3: note: candidate constructor not viable: no known conversion from '(lambda at ../src/backend/jit/llvm/llvmjit_wrap.cpp:57:16)' to 'GetMemoryManagerFunction' (aka 'unique_function (const MemoryBuffer &)>') for 2nd argument 58 | RTDyldObjectLinkingLayer(ExecutionSession &ES, | ^ 59 | GetMemoryManagerFunction GetMemoryManager); | ~ /opt/homebrew/Cellar/llvm/21.1.0/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:37:16: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided 37 | class LLVM_ABI RTDyldObjectLinkingLayer | ^~~~ I gather you're not seeing that? I temporarily removed the __arch64__ guard and can reproduce the above error, using either gcc-15 or clang-21. It seems this llvm monkey patch backport thing is affected by the following commit in llvm-21: https://github.com/llvm/llvm-project/commit/cd585864c0bbbd74ed2a2b1ccc191eed4d1c8f90 Trying to figure out how to adapt the code.. chers Holger
Re: Add support for specifying tables in pg_createsubscriber.
On Fri, Sep 12, 2025, at 2:05 AM, Amit Kapila wrote: > Yeah, I am also not sure that it is worth adding a new option to save > backward compatibility for this option. It seems intuitive to use > existing publications when they exist, though we should clearly > document it. > That's why we have a "Migration to Version XY" in the release notes. ;) -- Euler Taveira EDB https://www.enterprisedb.com/
Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt
On 05.09.25 14:30, Peter Eisentraut wrote: On 29.08.25 14:48, Álvaro Herrera wrote: On 2025-Aug-29, jian he wrote: On Fri, Aug 29, 2025 at 5:46 AM Tom Lane wrote: WFM, although I think you could shorten it to "tables, materialized views, and foreign tables". We generally expect that partitioned tables are included when saying "tables", no? I'm not dead set on that either way, though. https://www.postgresql.org/docs/current/sql-copy.html use "COPY TO can be used only with plain tables, not views, and does not copy rows from child tables or child partitions" I'm inclined to think that we should only mention partitioned tables specifically when they for some reason deviate from what we do for regular tables, i.e., what Tom is saying. I don't think we've had an explicit, consistent rule for that thus far, so there may be places where we fail to follow it. Anyway, I have pushed the error message change. I think this message is still wrong. The check doesn't even look at the relation kind, which is what the message is implying. (If the message were about relkinds, then it should use errdetail_relkind_not_supported().) It checks that the from list entry is a table name instead of some other thing like VALUES or JOIN. So it should be something like CREATE STATISTICS only supports plain table names in the FROM clause I propose the attached patch to fix this. I think this restores the original meaning better. From 7c55d765dc1ec2f8a28e8ae1fe3551d32ed5a964 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 12 Sep 2025 15:30:16 +0200 Subject: [PATCH v4] CREATE STATISTICS: improve misleading error message The previous change (commit f225473cbae) was still not on target, because it talked about relation kinds, which are not what is being checked here. Provide a more accurate message. Discussion: https://postgr.es/m/CACJufxEZ48toGH0Em_6vdsT57Y3L8pLF=DZCQ_gCii6=c3m...@mail.gmail.com --- src/backend/tcop/utility.c | 5 ++--- src/test/regress/expected/stats_ext.out | 21 +++-- 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 5f442bc3bd4..68605656b31 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -1873,9 +1873,8 @@ ProcessUtilitySlow(ParseState *pstate, if (!IsA(rel, RangeVar)) ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), -errmsg("cannot create statistics on the specified relation"), - errdetail("CREATE STATISTICS only supports tables, foreign tables and materialized views."))); + (errcode(ERRCODE_SYNTAX_ERROR), +errmsg("CREATE STATISTICS only supports relation names in the FROM clause"))); /* * CREATE STATISTICS will influence future execution plans diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index a1f83b58b23..fdc0aa130bd 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -56,29 +56,22 @@ CREATE STATISTICS tst (unrecognized) ON x, y FROM ext_stats_test; ERROR: unrecognized statistics kind "unrecognized" -- unsupported targets CREATE STATISTICS tst ON a FROM (VALUES (x)) AS foo; -ERROR: cannot create statistics on the specified relation -DETAIL: CREATE STATISTICS only supports tables, foreign tables and materialized views. +ERROR: CREATE STATISTICS only supports relation names in the FROM clause CREATE STATISTICS tst ON a FROM foo NATURAL JOIN bar; -ERROR: cannot create statistics on the specified relation -DETAIL: CREATE STATISTICS only supports tables, foreign tables and materialized views. +ERROR: CREATE STATISTICS only supports relation names in the FROM clause CREATE STATISTICS tst ON a FROM (SELECT * FROM ext_stats_test) AS foo; -ERROR: cannot create statistics on the specified relation -DETAIL: CREATE STATISTICS only supports tables, foreign tables and materialized views. +ERROR: CREATE STATISTICS only supports relation names in the FROM clause CREATE STATISTICS tst ON a FROM ext_stats_test s TABLESAMPLE system (x); -ERROR: cannot create statistics on the specified relation -DETAIL: CREATE STATISTICS only supports tables, foreign tables and materialized views. +ERROR: CREATE STATISTICS only supports relation names in the FROM clause CREATE STATISTICS tst ON a FROM XMLTABLE('foo' PASSING 'bar' COLUMNS a text); -ERROR: cannot create statistics on the specified relation -DETAIL: CREATE STATISTICS only supports tabl
Re: Cannot find a working 64-bit integer type on Illumos
On 04.09.25 11:01, Peter Eisentraut wrote: On 03.09.25 17:04, Peter Eisentraut wrote: Consider a third-party extension that does something like dblink or postgres_fdw. It will compile against a server and also a libpq. The server and the libpq might not be of the same major version. (On Debian, only the latest libpq will be available.) If you have for example server version 17 and libpq version 18, then you will get the pg_int64 typedef both from postgres_ext.h (from the PG17 server includes) and from libpq-fe.h (from PG18 libpq). That is not allowed in C99, and even if it were, the underlying types of PG_INT64_TYPE (in PG17) and int64_t (in PG18) might be different (long int vs. long long int) and this would fail. I think this could be fixed by moving the definition of pg_int64 back to postgres_ext.h. Then extension builds would only get one definition, because of the header guards. Depending on include order, they could get a different underlying type, but that's a smaller problem, since the type is supposed to be deprecated anyway. Here is a patch that has been reported to fix the problem. I propose to go ahead with this patch in a few days if there are no other solutions coming.
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
Hi, On 11/09/2025 16:28, Sami Imseih wrote: I prefer a separate thread for this, as it's an optimization of the existing behavior. OK, I removed my changes to CreatePortal. I left the test description fixes and added a proposed commit message. If no objections I will set it to ready for committer in the app. Kind regards, Mircea Cadariu From 268e2a5bac002ed32db4b2b8ab5dc4bb4f64c290 Mon Sep 17 00:00:00 2001 From: "Sami Imseih (AWS)" Date: Fri, 29 Aug 2025 17:19:34 + Subject: [PATCH v10] =?UTF-8?q?Fix=20incorrect=20query=20attribution=20in?= =?UTF-8?q?=20temporary=E2=80=90file=20logging?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously, in the extended query protocol with unnamed portals, the log line for temporary files would show the wrong STATEMENT: line (e.g. from a subsequent simple query) rather than the query that actually caused the temp file. This happened because the unnamed portal from the previous Bind was dropped after setting the new debug_query_string, so the logging triggered by PortalDrop used the new query text instead of the one that had generated the temporary file. This patch ensures that when an unnamed portal is dropped, it is associated with the correct query text by dropping the previous unnamed portal before assigning the debug_query_string. Author: Sami Imseih Author: Frédéric Yhuel Reviewed-by: Mircea Cadariu Discussion: https://www.postgresql.org/message-id/flat/3d07ee43-8855-42db-97e0-bad5db82d...@dalibo.com --- src/backend/tcop/postgres.c | 35 ++ src/test/modules/test_misc/meson.build| 1 + .../modules/test_misc/t/009_log_temp_files.pl | 115 ++ 3 files changed, 151 insertions(+) create mode 100644 src/test/modules/test_misc/t/009_log_temp_files.pl diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index d356830f75..d74e9e7bd1 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -148,6 +148,7 @@ static bool ignore_till_sync = false; * in order to reduce overhead for short-lived queries. */ static CachedPlanSource *unnamed_stmt_psrc = NULL; +static bool unnamed_portal = false; /* assorted command-line switches */ static const char *userDoption = NULL; /* -D switch */ @@ -182,6 +183,7 @@ static bool IsTransactionExitStmt(Node *parsetree); static bool IsTransactionExitStmtList(List *pstmts); static bool IsTransactionStmtList(List *pstmts); static void drop_unnamed_stmt(void); +static void drop_unnamed_portal(void); static void log_disconnections(int code, Datum arg); static void enable_statement_timeout(void); static void disable_statement_timeout(void); @@ -1024,6 +1026,12 @@ exec_simple_query(const char *query_string) booluse_implicit_block; charmsec_str[32]; + /* +* Drop the unnamed portal before setting debug_query_string, to avoid +* attributing messages from the drop (e.g., temp usage) to the new query. +*/ + drop_unnamed_portal(); + /* * Report query to various monitoring facilities. */ @@ -1676,6 +1684,12 @@ exec_bind_message(StringInfo input_message) errmsg("unnamed prepared statement does not exist"))); } + /* +* Same as exec_simple_query, drop the unnamed portal before setting +* debug_query_string. +*/ + drop_unnamed_portal(); + /* * Report query to various monitoring facilities. */ @@ -1757,10 +1771,14 @@ exec_bind_message(StringInfo input_message) * if the unnamed portal is specified. */ if (portal_name[0] == '\0') + { portal = CreatePortal(portal_name, true, true); + unnamed_portal = true; + } else portal = CreatePortal(portal_name, false, false); + /* * Prepare to copy stuff into the portal's memory context. We do all this * copying first, because it could possibly fail (out-of-memory) and we @@ -5236,3 +5254,20 @@ disable_statement_timeout(void) if (get_timeout_active(STATEMENT_TIMEOUT)) disable_timeout(STATEMENT_TIMEOUT, false); } + +/* Drop the unnamed portal if one exists */ +static void +drop_unnamed_portal(void) +{ + Portal portal; + + if (!unnamed_portal) + return; + + /* Get the portal and drop it */ + portal = GetPortalByName(""); + if (PortalIsValid(portal)) + PortalDrop(portal, false); + + unnamed_portal = false; +} diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build index 6b1e730bf4..f258bf1ccd 100644 --- a/src/test/modules/test_misc/meson.build +++ b/src/test/modules/test_misc/meson.build @@ -17,6 +17,7 @@ tests += { 't/006_signal_autovacuum.pl', 't/007_catcache_inval.pl',
Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt
Peter Eisentraut writes: > I propose the attached patch to fix this. I think this restores the > original meaning better. I'm okay with this wording change, but I would stay with ERRCODE_FEATURE_NOT_SUPPORTED rather than calling this a "syntax error". It's not a syntax error IMV, but a potential feature that we have deliberately left syntax space for, even though we don't yet have ideas about a workable implementation. regards, tom lane
Re: split func.sgml to separated individual sgml files
Hi, On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan wrote: > > Ah, you’re right, but then again, I’d expect ALL_SGML to be used > consistently, but it isn't and I didn't check. > v3 does that. > Note that GENERATED_SGML where'te included in these two targets but I think > there's no harm in checking them too. > > Do we actually care about those? I don't want to add needless cycles > anywhere. I note that the meson.build doesn't appear to have a check target > at all, or anything that looks for hard tabs or nbsps.Those checks were added > to the Makefile back in October in commit 5b7da5c261d, but that got missed > even though Daniel had mentioned it in the discussion thread.[1] I have been working on running these checks under the Meson build system. To do this, I converted the checks into a Perl script (sgml_syntax_check) and ran it against both the Makefile and Meson. Test's name is 'sgml_syntax_check' in the Meson. One difference I noticed: I could not find a way in Meson to create a test that does not run by default. As a result, this syntax test runs every time you run the 'meson test'. This behaviour differs from Autoconf, but I think it is acceptable. Additionally, some of the CI OSes were missing docbook-xml; but it has now been installed. I did not create a new thread for that, I can create one if you think that it would be better. CI run with the attached patch applied: https://cirrus-ci.com/build/6610354173640704 -- Regards, Nazir Bilal Yavuz Microsoft From 27ab61775945d837e37ed6a0ce0c301697d183a1 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 8 Sep 2025 17:16:05 +0300 Subject: [PATCH v1] Add sgml_syntax_check test to the Meson build The 'sgml' check from the Makefile has been converted into a Perl script (sgml_syntax_check) and integrated into meson.build. Unlike Autoconf, Meson does not provide a way to mark tests as non-default, so this script runs on every 'meson test'. While this differs from the previous behavior, it is considered acceptable. --- doc/src/sgml/Makefile | 16 +--- doc/src/sgml/meson.build| 23 ++ doc/src/sgml/t/sgml_syntax_check.pl | 118 .cirrus.tasks.yml | 3 + 4 files changed, 146 insertions(+), 14 deletions(-) create mode 100755 doc/src/sgml/t/sgml_syntax_check.pl diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 11aac913812..3256340a5b2 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -200,8 +200,8 @@ MAKEINFO = makeinfo ## # Quick syntax check without style processing -check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp - $(XMLLINT) $(XMLINCLUDE) --noout --valid $< +check: postgres.sgml $(ALL_SGML) + $(PERL) $(srcdir)/t/sgml_syntax_check.pl --xmllint "$(XMLLINT)" --srcdir $(srcdir) ## @@ -261,18 +261,6 @@ clean-man: endif # sqlmansectnum != 7 -# tabs are harmless, but it is best to avoid them in SGML files -check-tabs: - @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \ - (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) - -# Non-breaking spaces are harmless, but it is best to avoid them in SGML files. -# Use perl command because non-GNU grep or sed could not have hex escape sequence. -check-nbsp: - @ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \ - $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \ - (echo "Non-breaking spaces appear in SGML/XML files" 1>&2; exit 1) - ## ## Clean ## diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build index 6ae192eac68..89d8b01c944 100644 --- a/doc/src/sgml/meson.build +++ b/doc/src/sgml/meson.build @@ -306,3 +306,26 @@ endif if alldocs.length() != 0 alias_target('alldocs', alldocs) endif + +sgml_syntax_check = files( + 't/sgml_syntax_check.pl' +) + +test( + 'sgml_syntax_check', + perl, + protocol: 'exitcode', + suite: 'doc', + args: [ +sgml_syntax_check, +'--xmllint', + '@0@ --nonet'.format(xmllint_bin.full_path()), +'--srcdir', + meson.current_source_dir(), +'--builddir', + meson.current_build_dir(), + ], + depends: doc_generated +) + +testprep_targets += doc_generated diff --git a/doc/src/sgml/t/sgml_syntax_check.pl b/doc/src/sgml/t/sgml_syntax_check.pl new file mode 100755 index 000..7ff1d9a7c26 --- /dev/null +++ b/doc/src/sgml/t/sgml_syntax_check.pl @@ -0,0 +1,118 @@ +# /usr/bin/perl + +# doc/src/sgml/sgml_syntax_check.pl + +use strict; +use warnings FATAL => 'all'; +use Getopt::Long; + +use File::Find; + +my $xmllint; +my $srcdir; +my $builddir; + +GetOptions( + 'xmllint:s' => \$xmllint, + 'srcdir:s' => \$srcdir, + 'builddir:s' => \$builddir) or die "$0: wrong arguments"; + +die "$0: --srcdir must be specified\n" unless defined $srcdir; + +my $postgres_sgml = "postgres.sgml"; +my $xmlinclude = "--path . --path $srcdir"; +
Re: [PATCH] jit: fix build with LLVM-21
On 2025-09-12 14:47, Holger Hoffstätte wrote: I temporarily removed the __arch64__ guard and can reproduce the above error, using either gcc-15 or clang-21. It seems this llvm monkey patch backport thing is affected by the following commit in llvm-21: https://github.com/llvm/llvm-project/commit/cd585864c0bbbd74ed2a2b1ccc191eed4d1c8f90 Trying to figure out how to adapt the code.. Try the attached patch on your homebrew setup. This compiles and passes "make check", though I do not think it runs any jit tests. So intead I dropped this into my 17.6, disabled the __arch64__ guard to make sure I get the backport class, rebuilt and ran the jit on/off example from the docs. This showed the expected performance difference with jit=on vs. off. cheers Holger >From da8c15ef4f8dbfe931fe2700d849f49078bfca97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Holger=20Hoffst=C3=A4tte?= Date: Fri, 12 Sep 2025 16:01:15 +0200 Subject: [PATCH] jit: update aarch64-only use of SectionMemoryManager for LLVM-21 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit LLVM-21 changed the invocation of RTDyldObjectLinkingLayer, breaking compilation with the backported SectionMemoryManager for aarch64. Introduce a version guard for the newly passed parameter. Signed-off-by: Holger Hoffstätte --- src/backend/jit/llvm/llvmjit_wrap.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/backend/jit/llvm/llvmjit_wrap.cpp b/src/backend/jit/llvm/llvmjit_wrap.cpp index da850d67ab6..d98e3d0d900 100644 --- a/src/backend/jit/llvm/llvmjit_wrap.cpp +++ b/src/backend/jit/llvm/llvmjit_wrap.cpp @@ -53,7 +53,14 @@ DEFINE_SIMPLE_CONVERSION_FUNCTIONS(llvm::orc::ObjectLayer, LLVMOrcObjectLayerRef LLVMOrcObjectLayerRef LLVMOrcCreateRTDyldObjectLinkingLayerWithSafeSectionMemoryManager(LLVMOrcExecutionSessionRef ES) { +#if LLVM_VERSION_MAJOR >= 21 + return wrap(new llvm::orc::RTDyldObjectLinkingLayer( + *unwrap(ES), [](const llvm::MemoryBuffer &) { + return std::make_unique(nullptr, true); + })); +#else return wrap(new llvm::orc::RTDyldObjectLinkingLayer( *unwrap(ES), [] { return std::make_unique(nullptr, true); })); +#endif } #endif -- 2.51.0
Re: Issue with logical replication slot during switchover
Thanks Shveta for all your comments. I'll prepare a version 2 of the patch including your proposals. Regards Fabrice On Thu, 11 Sep 2025, 07:42 shveta malik wrote: > On Sun, Sep 7, 2025 at 1:32 PM Fabrice Chapuis > wrote: > > > > Hi, > > Here the version 1 of the patch with the modifications. > > I do a git diff --check until there are no more errors. > > In a next version of the patch, I think I have make change to call > ReplicationSlotCreate() function with the value of the flag > allow_overwrite be consistent > > In which flow? I see all ReplicationSlotCreate() references already > accepting the value. > > > and modify the interface ReplicationSlotCreate to display the attribute > allow_overwrite. > > > > Do you mean to modify pg_replication_slots? > > Thank You for the patch. Please find a few comments: > > 1) > In both create_physical_replication_slot() and > create_physical_replication_slot(): > + false, false,false); > > ,false --> , false (space after comma is recommended) > > 2) > + elog(DEBUG1, "logical replication slot %s created with option > allow_overwrite to %s", > + NameStr(slot->data.name), slot->data.allow_overwrite ? "true" : > "false"); > > IMO, we don't need this as we do not log other properties of the slot as > well. > > 3) > We can have pg_replication_slots (pg_get_replication_slots) modified > to display the new property. Also we can modify docs to have the new > property defined. > > 4) > + { > + /* Check if we need to overwrite an existing logical slot */ > + if (allow_overwrite) > + { > + /* Get rid of a replication slot that is no longer wanted */ > + ReplicationSlotDrop(remote_slot->name,true); > + > + /* Get rid of a replication slot that is no longer wanted */ > + ereport(WARNING, > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("slot \"%s\" already exists" > + " on the standby but it will be dropped because " > + "flag allow_overwrite is set to true", > + remote_slot->name)); > + > + /* Going back to the main loop after droping the failover slot */ > + return false; > > Currently we are returning after dropping the slot. But I think we > shall drop the slot and proceed with new slot creation of the same > name otherwise we may be left with old slot dropped and no new slot > created specially when API is run. > > Scenario: > --On primary, create 2 slots: slot1 and slot2. > --On standby, create one slot slot1 with allow_overwrite=true. > --Run 'SELECT pg_sync_replication_slots();' on standby. > > At the end of API, expectation is that both slots will be present with > 'synced'=true, but only slot2 is present. if we run this API next > time, slot1 is created. It should have been dropped and recreated (or > say overwritten) in the first run itself. > > 5) > IMO, LOG is sufficient here, as the action aligns with the > user-provided 'allow_overwrite' setting. > > thanks > Shveta >
Re: ABI Compliance Checker GSoC Project
On 12.09.25 16:52, David E. Wheeler wrote: On Sep 12, 2025, at 10:37, Peter Eisentraut wrote: This was a change that was intentionally backpatched in a different way in order to preserve ABI compatibility. Compare commits 344662848ac on REL_18_STABLE and 0b934d3994f on REL_17_STABLE. So everything is in order. :) Excellent! But an example like this presumably helps make Tom’s case that each branch could have a file that suggests which commit to use as the base for comparison, so that in this example it could be set to 344662848ac and the failures would go away. As it is, they will persist until a new tag is added or one overrides the base in the build farm client config. I don't think we need any ABI checking until there is a dot-0 release, so I don't agree that a facility like that is needed. Just compare against the previous release tag.
Re: ABI Compliance Checker GSoC Project
On Sep 12, 2025, at 11:37, Peter Eisentraut wrote: > I don't think we need any ABI checking until there is a dot-0 release, so I > don't agree that a facility like that is needed. Hrm. I wonder how best to filter out a branch without a dot-0 tag. > Just compare against the previous release tag. Yep, that’s what it’s doing. Best, David signature.asc Description: Message signed with OpenPGP
Re: plan shape work
On Fri, Sep 12, 2025 at 11:08 AM Tom Lane wrote: > I was arguing for labeling plan nodes according to which OJ they > start (always a unique relid). Richard was arguing for labeling > according to which OJ(s) they complete (zero, one, or more relids). I agree with everything in your reply up to and including this part. > But I now think it's probably worth doing both. We need the > completion bitmapsets if we want to cross-check Var nullingrels, > because those correspond to the nullingrels that should get added > at each join's output. I think that we also want the start labels > though. For one thing, if the start nodes are not identified, > it's impossible to understand how much of the tree is the "no > man's land" where a C variable may or may not have gone to null > on its way to becoming a C* variable. But in general I think > that we'll want to be able to identify an outer-join plan node > even if it does not complete its OJ. So, it looks to me like the way this works today is that join_is_legal() figures out the relevant SpecialJoinInfo and then add_outer_joins_to_relids() decides what to add to the joinrel's relid set. So I think, though I am not quite sure, that if somewhere around that point in the code we copied sjinfo->ojrelid into the RelOptInfo, and then propagated that through to the final plan, that might be what you're looking for here. However, that assumes that the choice of SpecialJoinInfo is fixed for all possible ways of constructing a given joinrel, which I think might not be true. In Richard's test case, one simply can't go wrong, because the lower join only draws from two baserels. But in a case like A LJ B ON A.x = B.x LJ C ON B.y = C.y LJ D ON B.z = D.z, the B-C-D joinrel could presumably be constructed by joining either B-D to C or B-C to D, and I'm guessing that will result in a different choice of SpecialJoinInfo in each case. That would mean that the ojrelid has to be per-Path rather than per-RelOptInfo. While in theory that's fine, it sounds expensive. I was hoping that we could piggyback mostly on existing calculations here, exposing data we already have instead of calculating new things. If we want to expose the starting-the-outer-join-RTI value that you want here, it seems like we're going to have to redo some of the join_is_legal() work and some of the add_outer_joins_to_relids() work for every path. I'm skeptical about expending those cycles. It's not clear to me that I need to care about outer join RTIs at all for what I'm trying to do -- focusing where RTIs originating from baserels end up in the final plan tree is, as far as I can see, completely adequate. There might be other people who want to do other things that would benefit more from seeing that stuff, though. I'm not against exposing information that is easily calculated and might be useful to somebody, even if it's just for planner debugging. But it seems to me that what you're asking here might be going quite a bit further than that. So my counter-proposal is that this patch set should either (1) expose nothing at all about join RTIs because I don't have a need for them or (2) expose the join RTIs completed at a certain level because that's easily calculated from the data we already have; and if you want to later also expose the single join RTI started at a certain level for a varnullingrels cross-check or any other purpose, then you can propose a patch for that. Alternatively, if you want to edit my 0003 patch to work the way you think it should, cool. Or if you can describe what you think it should do, I'm somewhat willing to try implementing that, but that's definitely not such a great choice from my perspective. I'm afraid that I'm getting pulled a bit further down the garden path than I really want to go trying to satisfy your desire to perform a cross-check that I don't really understand and/or expose information for which I don't see a clear need in my own work. What I need is for all the baserels that appear in the final Plan tree to be properly labelled. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Hello Peter! Thanks for updates. Here is a small fix for clang "variable set but not used" warnings. -- Regards, Timur Magomedov From 8c17b18dc0c15ae87945169a85c3ee944fc815b7 Mon Sep 17 00:00:00 2001 From: Timur Magomedov Date: Fri, 12 Sep 2025 17:09:50 +0300 Subject: [PATCH] Fixed "set but not used" clang warnings --- contrib/vci/executor/vci_vector_executor.c | 4 contrib/vci/storage/vci_fetch.c| 3 --- 2 files changed, 7 deletions(-) diff --git a/contrib/vci/executor/vci_vector_executor.c b/contrib/vci/executor/vci_vector_executor.c index daf475b964e..d89988b4e75 100644 --- a/contrib/vci/executor/vci_vector_executor.c +++ b/contrib/vci/executor/vci_vector_executor.c @@ -1792,7 +1792,6 @@ static vci_vp_item_id vci_add_control_nodes(VciVPExecOp_func head_func, VciVPExecOp_func next_func, List *args, Expr *expr, PlanState *parent, ExprContext *econtext, VciVPContext *vpcontext, uint16 *skip_list) { - int i; vci_vp_item_id ret; ListCell *l; Datum *itemValue = palloc(sizeof(Datum) * VCI_MAX_FETCHING_ROWS); @@ -1807,7 +1806,6 @@ vci_add_control_nodes(VciVPExecOp_func head_func, VciVPExecOp_func next_func, Li head_node->itemIsNull = itemIsNull; head_node->data.init.orig_skip_list = skip_list; - i = 0; foreach(l, args) { Expr *arg = (Expr *) lfirst(l); @@ -1820,8 +1818,6 @@ vci_add_control_nodes(VciVPExecOp_func head_func, VciVPExecOp_func next_func, Li next_node = &vpcontext->itemNode[ret]; next_node->itemValue = itemValue; next_node->itemIsNull = itemIsNull; - - i++; } return ret; diff --git a/contrib/vci/storage/vci_fetch.c b/contrib/vci/storage/vci_fetch.c index 80e497c4628..588ae59fa43 100644 --- a/contrib/vci/storage/vci_fetch.c +++ b/contrib/vci/storage/vci_fetch.c @@ -1479,7 +1479,6 @@ FillFixedWidthCopyBody1(char *dstData, int numRows) { int aId; - int numWrite = 0; BlockNumber bNumCur = startBN; int64 offset = startOf; Buffer buffer = InvalidBuffer; @@ -1516,7 +1515,6 @@ FillFixedWidthCopyBody1(char *dstData, #endif /* #ifdef WORDS_BIGENDIAN */ dstData += stepDstData; offset += dataWidth; - ++numWrite; } aId = maxBId; @@ -1547,7 +1545,6 @@ FillFixedWidthCopyBody1(char *dstData, #endif /* #ifdef WORDS_BIGENDIAN */ dstData += stepDstData; offset = nextOffset; - ++numWrite; ++aId; } else -- 2.43.0
Re: plan shape work
Robert Haas writes: > So, it looks to me like the way this works today is that > join_is_legal() figures out the relevant SpecialJoinInfo and then > add_outer_joins_to_relids() decides what to add to the joinrel's relid > set. So I think, though I am not quite sure, that if somewhere around > that point in the code we copied sjinfo->ojrelid into the RelOptInfo, > and then propagated that through to the final plan, that might be what > you're looking for here. Not the RelOptInfo, but the Path. I have not looked at details, but it might be necessary to identify these labels at Path construction time rather than reconstructing them during createplan.c. I'd rather not, because bloating Path nodes with hopefully-redundant information isn't attractive. But if it turns out that createplan.c doesn't have enough information then we might have to. I'm also thinking that this notion of starting/completing OJs might be useful in its own right to clarify and even simplify some of the Path manipulations we do. But that will require reviewing the code. > So my counter-proposal is that this patch set should either (1) expose > nothing at all about join RTIs because I don't have a need for them or > (2) expose the join RTIs completed at a certain level because that's > easily calculated from the data we already have; and if you want to > later also expose the single join RTI started at a certain level for a > varnullingrels cross-check or any other purpose, then you can propose > a patch for that. If what you want to do is only interested in baserel RTIs, then I think we should leave outer join RTIs out of the discussion for the present. I was not looking to make you do work you aren't interested in. I reserve the right to do said work later ... regards, tom lane
Re: Incorrect logic in XLogNeedsFlush()
On Fri, 2025-09-12 at 14:04 +0900, Michael Paquier wrote: > On Fri, Sep 12, 2025 at 10:21:27AM +0530, Dilip Kumar wrote: > > Yeah, asserting it at the end makes sense, as we can ensure that > > XLogFlush() and XLogNeedsFlush() agree on the same thing without > > adding additional overhead. Here is the revised one. > > Melanie and Jeff, what do you think? IIUC: during the end-of-recovery checkpoint, a hypothetical call to XLogNeedsFlush() would have incorrectly used the LocalMinRecoveryPoint; and with the change, it would correctly use the flush pointer. That wasn't a live bug because XLogNeedsFlush() was never actually called during the end-of-recovery checkpoint. CreateCheckPoint() called XLogFlush() directly, which correctly checked the flush pointer. But, hypothetically, if that code had logic based around XLogNeedsFlush() before calling XLogFlush(), that would have been a bug. So this fix has no behavior change, but it would make the code more resilient against changes to CreateCheckPoint(), more consistent, and less confusing. Is that right? Assuming I'm right so far, then I agree with changing the test in XLogNeedsFlush() to !XLogInsertAllowed(), as the patch does. The comment in the patch is describing what's happening, but lost the "why". Perhaps something like: "During recovery, we don't flush WAL but update minRecoveryPoint instead. So "needs flush" is taken to mean whether minRecoveryPoint would need to be updated. The end-of-recovery checkpoint still must flush WAL like normal, though, so check !XLogInsertAllowed(). This check must be consistent with XLogFlush()." The commit message is vague about whether there's a live bug or not; I suggest clarifying that the inconsistency between the two functions could have been a bug but wasn't. Also, I generally use past tense for the stuff that's being fixed and present tense for what the patch does. One loose end: there was some discussion about the times when LocalMinRecoveryPoint is valid/correct. I'm not sure I entirely understood -- is that no longer a concern? Regards, Jeff Davis
Re: ABI Compliance Checker GSoC Project
On 08.09.25 17:17, David E. Wheeler wrote: The RC1 change surprised me a little; here’s the log: Leaf changes summary: 1 artifact changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable 1 function with some sub-type change: [C] 'function void CheckValidResultRel(ResultRelInfo*, CmdType, List*)' has some sub-type changes: parameter 4 of type 'List*' was added parameter 3 of type 'List*' changed: entity changed from 'List*' to 'typedef OnConflictAction' type size changed from 8 to 4 (in bytes) type alignment changed from 0 to 4 Presumably this is expected, but it looks like it might be an issue if it weren’t a pre-release change, yes? This was a change that was intentionally backpatched in a different way in order to preserve ABI compatibility. Compare commits 344662848ac on REL_18_STABLE and 0b934d3994f on REL_17_STABLE. So everything is in order. :)
Re: PostgreSQL 18 GA press release draft
On Tue, 2025-09-09 at 23:13 -0400, Jonathan S. Katz wrote: > Hi, > > Attached is a draft of the PostgreSQL 18 GA press release. A few > comments before the ask for reviewing: "This release adds the [`PG_UNICODE_FAST`], which accelerates lookups using the `upper` and `lower` string comparison functions, and helps speed up the new [`casefold`] function for case-insensitive comparisons." Use the link: https://www.postgresql.org/docs/18/collation.html#COLLATION-MANAGING-STANDARD which is a bit more descriptive. (Though perhaps that document could be improved with examples.) What's actually new with PG_UNICODE_FAST are the "full" Unicode semantics for case transformations, e.g. it uppercases 'ß' to 'SS'. The rest is the same as PC_C_UTF8, which uses "simple" Unicode semantics. The name PG_UNICODE_FAST is meant to convey that it provides full unicode semantics for case mapping and pattern matching, while also being fast because it uses memcmp for comparisons. While the name PG_C_UTF8 is meant to convey that it's closer to what users of the libc "C.UTF-8" locale might expect. Regards, Jeff Davis
Re: ABI Compliance Checker GSoC Project
Peter Eisentraut writes: > On 12.09.25 16:52, David E. Wheeler wrote: >> Excellent! But an example like this presumably helps make Tom’s case that >> each branch could have a file that suggests which commit to use as the base >> for comparison, so that in this example it could be set to 344662848ac and >> the failures would go away. As it is, they will persist until a new tag is >> added or one overrides the base in the build farm client config. > I don't think we need any ABI checking until there is a dot-0 release, > so I don't agree that a facility like that is needed. Just compare > against the previous release tag. My concern is that when we get a report, we might decide to apply some fix to remove the ABI delta, or we might decide it's intentional and/or harmless and leave the code as-is. In the latter case, the ABI-checking BF animal(s) will be red until the next release is made, which helps nobody, particularly because it will probably interfere with noticing any ABI breaks in later commits. So I think we do need a way to move a branch's baseline that is not rigidly tied to making a release. The only way we could possibly use this testing methodology without that is if the ABI checker agrees 100.00% with our opinion of what is or is not a breaking change. Which we already see isn't so. A concrete example is the hack we frequently use of sandwiching a new field into what had been padding space. I would be astonished if an ABI checker doesn't flag that, but the checker's output doesn't make it not OK. I'm not wedded to any particular way of implementing that, though I'd prefer to avoid basing it on git tags given that they can't readily be deleted or moved. regards, tom lane
Avoid resource leak (src/test/modules/test_binaryheap/test_binaryheap.c)
Hi. Per Coverity. Coverity reports this resource leak in test_binaryheap module. I think that is right. Trivial patch attached. best regards, Ranier Vilela avoid-resource-leak-test_binaryheap.patch Description: Binary data
Re: Unexpected changes of CurrentResourceOwner and CurrentMemoryContext
Álvaro Herrera wrote: > On 2025-Sep-12, Antonin Houska wrote: > > > Euler Taveira wrote: > > > > > Interesting. I'm wondering that if this patch is applied we could remove > > > the > > > following code [...] from pg_logical_slot_get_changes_guts and > > > LogicalSlotAdvanceAndCheckSnapState functions too. IIUC the referred > > > code is a band-aid that will be improved someday. > > > > Even though we're fixing the likely reason of this problem, we cannot be > > 100% > > sure that no other problem like this still exists. So I'd not remove this > > assignment. Maybe add Assert(CurrentResourceOwner == old_resowner) in front > > of > > that, and adjust the comment? > > Yeah, I'm going to risk removing it, because if we don't do it now, > we're never going to do it. We can mitigate the risk of missing > remaining bugs by having that assertion you suggest, so that if anyone > actually hits a problem here, we'll know soon enough. > > I have pushed it with that change. I'll also add an open item for pg19 > so that we remember to come back to remove the assertions, if we feel we > no longer need them. I was worried about removing those workarounds because it was not trivial to diagnose the issue. But it should be ok for the master branch. Thanks. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput
On Wed, Sep 10, 2025 at 12:00 PM Arseniy Mukhin wrote: > > > 6) We have fixed size queue entries, so I think we don't need this > "padding" logic at the end of the page anymore, because we know how > many entries we can have on each page. +1 Probably we also no longer need to have a pair of page number and offset number as an entry queue's position since AsyncQueueEntry is now a fixed size. > BTW, what do you think about creating a separate thread for the patch? > The current thread's subject seems a bit irrelevant. +1 Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: split func.sgml to separated individual sgml files
On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote: Hi, On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan wrote: Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check. v3 does that. Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too. Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1] I have been working on running these checks under the Meson build system. Thanks for this! To do this, I converted the checks into a Perl script (sgml_syntax_check) and ran it against both the Makefile and Meson. Test's name is 'sgml_syntax_check' in the Meson. One difference I noticed: I could not find a way in Meson to create a test that does not run by default. As a result, this syntax test runs every time you run the 'meson test'. This behaviour differs from Autoconf, but I think it is acceptable. Yes, I think so too. Additionally, some of the CI OSes were missing docbook-xml; but it has now been installed. I did not create a new thread for that, I can create one if you think that it would be better. CI run with the attached patch applied: https://cirrus-ci.com/build/6610354173640704 I am away this coming week, will check it out in detail when I return. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: PostgreSQL 18 GA press release draft
On Fri, Sep 12, 2025 at 8:46 AM Jonathan S. Katz wrote: > > On 9/9/25 11:13 PM, Jonathan S. Katz wrote: > > Hi, > > > > Attached is a draft of the PostgreSQL 18 GA press release. A few > > comments before the ask for reviewing: > > > > * I still need to write up the theme and the quote, which I'll provide > > tomorrow after hearing the first round of feedback > > * The release is a bit long - this is due in part to what I think is a > > lot of really good features in PostgreSQL 18. "Something for everyone" > > > > Please review for the press release for the following: > > > > * Correctness and accuracy of what's described - we want to be > > technically correct, but we also need to explain why something matters > > to our users. > > * Notable omissions (apologies if I missed something, but also note not > > everything going to make the cut). > > * Features to exclude. > > * How easy phrases are to translate / confusing phrases > > * Typos. > > > > Like previous years, I won't really look at stylistic remarks, unless it > > falls into the "making it simpler to translate / confusing phrase" bucket. > > > > After 2025-09-14 0:00 UTC I won't be able to accept more feedback, as > > we'll need to begin the translation process. Anyone who provides > > accepted feedback will also get a mention in the press release "thank > > you" page[1], which stays up for about a year ;) > > Thanks for all of your feedback thus far! Attached is the next version > of the draft. This incorporates the feedback, and now includes the > thematic statement and a draft quote. > > As per above, I'll need remaining feedback no later than 2025-09-14 0:00 > UTC - after this, it's frozen for the translation process to begin. > I found a typo; s/gen_rand_uuid()/gen_random_uuid()/ Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: PostgreSQL 18 GA press release draft
On Fri, Sep 12, 2025 at 10:11:59AM -0700, Jeff Davis wrote: > The name PG_UNICODE_FAST is meant to convey that it provides full > unicode semantics for case mapping and pattern matching, while also > being fast because it uses memcmp for comparisons. While the name > PG_C_UTF8 is meant to convey that it's closer to what users of the libc > "C.UTF-8" locale might expect. How does one do form-insensitive comparisons? Nico --
Re: pg_waldump: support decoding of WAL inside tarfile
Here are some review comments on v3-0004: In general, I think this looks pretty nice, but I think it needs more cleanup and polishing. There doesn't seem to be any reason for astreamer_waldump_content_new() to take an astreamer *next argument. If you look at astreamer.h, you'll see that some astreamer_BLAH_new() functions take such an argument, and others don't. The ones that do forward their input to another astreamer; the ones that don't, like astreamer_plain_writer_new(), send it somewhere else. AFAICT, this astreamer is never going to send its output to another astreamer, so there's no reason for this argument. I'm also a little confused by the choice of the name astreamer_waldump_content_new(). I would have thought this would be something like astreamer_waldump_new() or astreamer_xlogreader_new(). The word "content" doesn't seem to me to be adding much here, and it invites confusion with the "content" callback. I think you can merge setup_astreamer() into init_tar_archive_reader(). The only other caller is verify_tar_archive(), but that does exactly the same additional steps as init_tar_archive_reader(), as far as I can see. The return statement for astreamer_wal_read is really odd: + return (count - nbytes) ? (count - nbytes) : -1; Since 0 is false in C, this is equivalent to: count != nbytes ? count - nbytes : -1, but it's a strange way to write it. What makes it even stranger is that it seems as though the intention here is to count the number of bytes read, but you do that by taking the number of bytes requested (count) and subtracting the number of bytes we didn't manage to read (nbytes); and then you just up and return -1 instead of 0 whenever the answer would have been zero. This is all lacking in comments and seems a bit more confusing than it needs to be. So my suggestions are: 1. Consider redefining nbytes to be the number of bytes that you have read instead of the number of bytes you haven't read. So the loop in this function would be while (nbytes < count) instead of while (nbytes > 0). 2. If you need to map 0 to -1, consider having the caller do this instead of putting that inside this function. 3. Add a comment saying what the return value is supposed to be". If you do both 1 and 2, then the return statement can just say "return nbytes;" and the comment can say "Returns the number of bytes successfully read." I would suggest changing the name of the variable from "readBuff" to "readBuf". There are no existing uses of readBuff in the code base. I think this comment also needs improvement: + /* +* Ignore existing data if the required target page has not yet been +* read. +*/ + if (recptr >= endPtr) + { + len = 0; + + /* Reset the buffer */ + resetStringInfo(astreamer_buf); + } This comment is problematic for a few reasons. First, we're not ignoring the existing data: we're throwing it out. Second, the comment doesn't say why we're doing what we're doing, only that we're doing it. Here's my guess at the actual explanation -- please correct me if I'm wrong: "pg_waldump never reads the same WAL bytes more than once, so if we're now being asked for data beyond the end of what we've already read, that means none of the data we currently have in the buffer will ever be consulted again. So, we can discard the existing buffer contents and start over." By the way, if this explanation is correct, it might be nice to add an assertion someplace that verifies it, like asserting that we're always reading from an LSN greater than or equal to (or exactly equal to?) the LSN immediately following the last data we read. In general, I wonder whether there's a way to make the separation of concerns between astreamer_wal_read() and TarWALDumpReadPage() cleaner. Right now, the latter is basically a stub, but I'm not sure that is the best thing here. I already mentioned one example of how to do this: make the responsibility for 0 => -1 translation the job of TarWALDumpReadPage() rather than astreamer_wal_read(). But I think there might be a little more we can do. In particular, I wonder whether we could say that astreamer_wal_read() is only responsible for filling the buffer, and the caller, TarWALDumpReadPage() in this case, needs to empty it. That seems like it might produce a cleaner separation of duties. Another thing that isn't so nice right now is that verify_tar_archive() has to open and close the archive only for init_tar_archive_reader() to be called to reopen it again just moments later. It would be nicer to open the file just once and then keep it open. Here again, I wonder if the separation of duties could be a bit cleaner. Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or astreamer_archive_read? The only things that they need are archive_fd, archive_name, archive_streamer, archive_streamer_
Re: Avoid resource leak (src/test/modules/test_binaryheap/test_binaryheap.c)
On Fri, Sep 12, 2025 at 1:54 PM Ranier Vilela wrote: > Per Coverity. > > Coverity reports this resource leak in test_binaryheap module. > I think that is right. > > Trivial patch attached. If this were correct, we'd need to also free the memory in all the error paths. But of course, in both error and non-error paths, we rely on memory context cleanup to free memory for us, except in cases where there's some specific reason to believe that's not good enough. I doubt that there is any such reason in this case. See src/backend/utils/mmgr/README -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
On Fri, Sep 12, 2025 at 9:03 AM Naga Appani wrote: > + + + Returns statistics about current multixact usage: + num_mxids is the total number of multixact IDs assigned since startup, + num_members is the total number of multixact member entries created since startup, GetMultiXactInfo() returns following *members = nextOffset - *oldestOffset; *multixacts = nextMultiXactId - *oldestMultiXactId; They seem to be the numbers that exist in the system at the time of the call and not since the startup. Am I missing something? + up to approximately 2^32 entries(occupying roughly 20GB in the space between s and ( + proallargtypes => '{int4,int8,int8,xid,int8}', I think the first parameter should also be int8 since uint32 won't fit into int4. > > > + See for further > > details on multixact wraparound. > > > > I don't think we need this reference here. Reference back from that > > section is enough. > I kept the cross-reference for now since other multixact function docs > (such as pg_get_multixact_members()) already use this style, and it helps > readers who land directly on the function reference page. Please let me > know if you would prefer that I remove it. None of the write up there talks about multixact wraparound so the reference seems arbitrary to me. I would remove it. > > > + * Returns NULL if the oldest referenced offset is unknown, which > > happens during > > + * system startup or when no MultiXact references exist in any relation. > > > > If no MultiXact references exist, and GetMultiXactInfo() returns > > false, MultiXactMemberFreezeThreshold() will assume the worst, which I > > take as meaning that it will trigger aggressive autovacuum. No > > MultiXact references existing is a common case which shouldn't be > > assumed as the worst case. The comment I quoted means "the oldest > > value of the offset referenced by any multi-xact referenced by a > > relation *may not be always known". You seem to have interpreted "may > > not be known" as "does not exist" That's not right. I would write this > > as "Returns NULL if the oldest referenced offset is unknown which > > happens during system startup". > > > > Similarly I would rephrase the following docs as > > + > > + The function returns NULL when multixact > > statistics are unavailable. > > + For example, during startup before multixact initialization completes or > > when > > + the oldest member offset cannot be determined. > > > > "The function returns NULL when multixact > > statistics when the oldest multixact offset corresponding to a > > multixact referenced by a relation is not known after starting the > > system." > > > Updated. Thanks for updating the documentation. But the comment in prologue of pg_get_multixact_stats is not completely correct as mentioned in my previous reply. I would just say "Returns NULL if the oldest referenced offset is unknown". Anybody who wants to know when can that happen, may search relevant code by looking at GetMultiXactInfo(). I still find the comment at the start of the isolation test a bit verbose. But I think it's best to leave that to a committer's judgement. However, the word "locker" is unusual. You want to say the session that locks a row (or something similar). We may leave exact words to a committer's judgement. I still find think that the specific usage scenarios described in the documentation are not required. And the computation in pg_get_multixact_stats() should use macros instead of bare numbers. But we can leave that for a committer to decide. Once you address above comments, we may mark the CF entry as RFC. -- Best Wishes, Ashutosh Bapat
Re: PostgreSQL 18 GA press release draft
Hello, Thanks for putting this together! It's quite the monster. I read through and found the following points worth mentioning: I think almost all of the non-stock section titles Are Lacking Case Title. > The new > [`io_method`](https://www.postgresql.org/docs/18/runtime-config-resource.html#GUC-IO-METHOD) > setting lets you toggle between the AIO methods, including `worker` and > `io_uring` (when built with PostgreSQL, available on certain Linux systems), > or you can choose to maintain the current PostgreSQL behavior with the `sync` > setting. There are now more parameters to consider tuning with AIO, which you > can [learn more about in the > documentation](https://www.postgresql.org/docs/18/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-IO). I don't understand the "when built with PostgreSQL". Did you mean to reference something else here? > PostgreSQL 18 further accelerates query performance with features that > automatically make your workloads faster. This release introduces "skip scan" > lookups on [multicolumn B-tree > indexes](https://www.postgresql.org/docs/18/indexes-multicolumn.html), which > improves execution time for queries that omit an `=` condition on one or more > prefix index columns. It can also optimize queries that use `OR` conditions > in a `WHERE` to use an index, leading to significantly faster execution. > There are also numerous improvements for how PostgreSQL plans and executes > table joins, from boosting the performance of hash joins to allowing merge > joins to use incremental sorts. introduces "skip scan" lookups ..., which improve (remove ending 's') > PostgreSQL 18 now supports parallel builds for [GIN > indexes](https://www.postgresql.org/docs/18/gin.html), joining B-tree and > [BRIN indexes](https://www.postgresql.org/docs/current/brin.html) in > supporting this capability. Additionally, [materialized > views](https://www.postgresql.org/docs/18/rules-materializedviews.html) can > now use unique indexes that aren't B-trees as partition keys, expanding how > you can construct materialized views. Materialized views can use non-btree indexes as partition keys? Sounds cool, but the linked-to matview page doesn't talk about partitioning at all. I think there's something wrong with the way this has been written. [trawls the release notes] Hmm, the release notes have this text: "Allow non-btree unique indexes to be used as partition keys and in materialized views" I think the confusion stems from having missed the "and" there. Actually I wonder if these two items (commits f278e1fe3 and 9d6db8bec) are actually worthy of being in the press release. They are about using unique indexes that aren't btrees, but as I understand, in stock Postgres there isn't any other way to build unique indexes, so this is about allowing out-of-core index AMs to be used in these contexts. > PostgreSQL 18 makes text processing easier and faster with several new > enhancements. This release adds the > [`PG_UNICODE_FAST`](https://www.postgresql.org/docs/18/locale.html#LOCALE-PROVIDERS), > which accelerates lookups using the `upper` and `lower` string comparison > functions, and helps speed up the new > [`casefold`](https://www.postgresql.org/docs/18/functions-string.html#FUNCTIONS-STRING-OTHER) > function for case-insensitive comparisons. Additionally, PostgreSQL 18 now > supports making `LIKE` comparisons over text that uses a [nondeterministic > collation](https://www.postgresql.org/docs/18/collation.html#COLLATION-NONDETERMINISTIC), > simplifying how you can perform more complex pattern matching. This release > also changes [full text > search](https://www.postgresql.org/docs/18/textsearch.html) to use the > default collation provider of a cluster instead of always using libc, which > may require you to reindex all [full text > search](https://www.postgresql.org/docs/18/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX) > and [`pg_trgm`](https://www.postgresql.org/docs/18/pgtrgm.html#PGTRGM-INDEX) > indexes after running > [`pg_upgrade`](https://www.postgresql.org/docs/18/pgupgrade.html). I think this should say "This release adds the PG_UNICODE_FAST local provider", or something like that, because ending in just "PG_UNICODE_FAST" seems to be unintelligible. > ### Authentication and security features In this section I would also mention that pgcrypto gained SHA-2 cipher for passwords (commit 749a9e20c979). > PostgreSQL 18 now supports reporting logical replication write conflicts in > logs and in the > [`pg_stat_subscription_stats`](https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION-STATS) > view. Additionally, [`CREATE > SUBSCRIPTION`](https://www.postgresql.org/docs/18/sql-createsubscription.html) > now defaults to use parallel streaming for applying transactions, which can > help improve performance. The > [`pg_createsubscriber`](https://www.postgresql.org/docs/18/app-pgcreatesub
Re: Increase OpenBSD CI task's RAM disk size
Hi, On 2025-09-02 10:24:34 +0200, Daniel Gustafsson wrote: > > On 15 Aug 2025, at 12:20, Nazir Bilal Yavuz wrote: > > On Fri, 15 Aug 2025 at 12:19, Daniel Gustafsson wrote: > >> > >>> On 15 Aug 2025, at 10:46, Nazir Bilal Yavuz wrote: > > >>> Sometimes OpenBSD CI tasks fail with 'No space left on device' [1]. > >>> It's due to the OpenBSD CI task's RAM disk size being ~3.8 GB. This > >>> patch increases it to ~4.6 GB. I chose this size manually, we can > >>> increase it more. > >> > >> I don't know Cirrus enough to know if this is a daft question, but does > >> this > >> impact the cost (in actual money or credits) for running the CI tests? > > > > I don't expect this to increase costs, and it might even reduce them > > slightly. Disks generally perform slower when free space is low, so > > the change could make OpenBSD CI tasks faster. > > Thanks for clarifying, this definitely seems like something we should do to > keep tests from failing. Pushed it now. I've just enabled openbsd, netbsd and mingw to run by default on postgres/postgres. If that works without a problem, I'll also enable it on postgresql-cfbot. Greetings, Andres Freund
Re: ABI Compliance Checker GSoC Project
=?utf-8?Q?=C3=81lvaro?= Herrera writes: > On 2025-Sep-12, Tom Lane wrote: >> My concern is that when we get a report, we might decide to apply >> some fix to remove the ABI delta, or we might decide it's intentional >> and/or harmless and leave the code as-is. > The solution I propose for this (which I have mentioned before) is to > allow for our source tree to carry exclusion files. The ABI checker > would refrain from turning red for any breaks that match what's in those > files. That seems substantially more complicated than just moving which commit is considered the baseline. It might be less reliable too: if the exclusions are written sloppily, they might hide problems we'd rather find out about. (I do not regard valgrind suppression files as a model of good engineering...) regards, tom lane
Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
>> Attached are the v18 patches for adding RESPECT/IGNORE NULLS options >> to some window functions. Recent changes to doc/src/sgml/func.sgml >> required v17 to be rebased. Other than that, nothing has been changed. >> >> Oliver, do you have any comments on the patches? >> > > Looks good, tried it on the nth_value test script from a bit ago - I added > a 1 million rows test and it takes an average of 12 seconds on my i7. I would like to push the patch by the end of this month or early in October if there's no objection. Comments/suggestions are welcome. -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: race condition in pg_class
Noah Misch writes: > On Thu, Sep 11, 2025 at 10:12:59PM -0400, Tom Lane wrote: >> I'm content with making it master-only. > I've pushed the test update to master. Thanks. I'll go deal with the original issue shortly. regards, tom lane
Re: race condition in pg_class
On Thu, Sep 11, 2025 at 10:12:59PM -0400, Tom Lane wrote: > Noah Misch writes: > > On Thu, Sep 11, 2025 at 03:34:47PM -0400, Tom Lane wrote: > >> However ... it might still be okay to cram it into v18. Do you have > >> an opinion about that? > > > Feels like something I wouldn't do, but I don't have a concrete reason. > > I'm content with making it master-only. I've pushed the test update to master.
Re: pg_waldump: support decoding of WAL inside tarfile
On Fri, Sep 12, 2025 at 2:28 PM Robert Haas wrote: > Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or > astreamer_archive_read? The only things that they need are archive_fd, > archive_name, archive_streamer, archive_streamer_buf, and > archive_streamer_read_ptr. In other words, they really don't care > about any of the *existing* things that are in XLogDumpPrivate. This > makes me wonder whether we should actually try to make this new > astreamer completely independent of xlogreader. In other words, > instead of calling it astreamer_waldump() or astreamer_xlogreader() as > I proposed above, maybe it could be a completely generic astreamer, > say astreamer_stringinfo_new(StringInfo *buf) that just appends to the > buffer. That would require also moving the stuff out of > astreamer_wal_read() that knows about XLogRecPtr, but why does that > function need to know about XLogRecPtr? Couldn't the caller figure out > that part and just tell this function how many bytes are needed? Hmm, on further thought, I think this was a silly idea. Part of the intended function of this astreamer is to make sure we're only reading WAL files from the archive, and eventually reordering them if required, so obviously something completely generic isn't going to work. Maybe there's a way to make this look a little cleaner and tidier but this isn't it... -- Robert Haas EDB: http://www.enterprisedb.com
Re: PostgreSQL 18 GA press release draft
On Fri, Sep 12, 2025 at 12:13:23PM -0700, Jeff Davis wrote: > On Fri, 2025-09-12 at 13:21 -0500, Nico Williams wrote: > > How does one do form-insensitive comparisons? > > If you mean case insensitive matching, you can do: I meant form-insensitive, as in comparing equivalent strings where one might be using decomposed sequences and another pre-conposed sequences (and either might be normalized or not at all). Nico --
Re: RFC: extensible planner state
On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov wrote: > > 1. Why is it necessary to maintain the GetExplainExtensionId and > GetPlannerExtensionId routines? It seems that using a single > extension_id (related to the order of the library inside the > file_scanner) is more transparent and more straightforward if necessary. But this wouldn't work for in-core use cases like GEQO, right? Also, how would it work if there are multiple "extensions" in the same .so file? > 2. Why does the extensibility approach in 0001 differ from that in 0004? > I can imagine it is all about limiting extensions, but anyway, a module > has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks > a little redundant, doesn't it? What do you mean that the extensibility approach differs? Like that the type of extension_state is different? - Melanie
Re: PostgreSQL 18 GA press release draft
On Fri, 2025-09-12 at 15:48 -0500, Nico Williams wrote: > I meant form-insensitive, as in comparing equivalent strings where > one > might be using decomposed sequences and another pre-conposed > sequences > (and either might be normalized or not at all). Use NORMALIZE(a) = NORMALIZE(b). This is getting a little off-topic from what's actually being released, so please reopen a more relevant thread or start a new one, and CC me. Regards, Jeff Davis
Re: ALTER DATABASE RESET with unexistent guc doesn't report an error
I wrote: > Looking at the patch, the delta in database.out raises > another question: > ALTER DATABASE regression_tbd RENAME TO regression_utf8; > ALTER DATABASE regression_utf8 SET TABLESPACE regress_tblspace; > ALTER DATABASE regression_utf8 RESET TABLESPACE; > +ERROR: unrecognized configuration parameter "tablespace" > ALTER DATABASE regression_utf8 CONNECTION_LIMIT 123; > The author of this bit of test script evidently thought that > ALTER ... RESET TABLESPACE is the inverse of ALTER ... SET TABLESPACE, > and what we are seeing is that it is not. That may be a bug in > itself, but it's not what Vitaly is on about, IIUC. That was in fact a test bug, now corrected at 4adb0380b. I've substituted a more on-point test case, wordsmithed the comment a little bit, and pushed it. Thanks for the report and patch! regards, tom lane
Re: [PG19-3 PATCH] Don't ignore passfile
On Sat, Sep 6, 2025 at 9:52 AM Paul Ohlhauser wrote: > My point is not that the user is not happy, that they have to change > permissions. > It is that the user would rather get a clear error message than to get two > separate messages (warning that doesn't mention "ignore" and authentication > error) they need to connect to understand what is going on. Would it address your concern if we reworded that error message to be more clear that the file is going to be ignored? I think that proposal would have a better chance of success than this one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Avoid resource leak (src/test/modules/test_binaryheap/test_binaryheap.c)
Robert Haas writes: > On Fri, Sep 12, 2025 at 1:54 PM Ranier Vilela wrote: >> Coverity reports this resource leak in test_binaryheap module. >> I think that is right. > If this were correct, we'd need to also free the memory in all the > error paths. But of course, in both error and non-error paths, we rely > on memory context cleanup to free memory for us, except in cases where > there's some specific reason to believe that's not good enough. I > doubt that there is any such reason in this case. I agree this isn't interesting from a resource-leak perspective. However, is it interesting from a test-coverage perspective? AFAICS, test_binaryheap doesn't presently exercise binaryheap_free, which seems a little sad for what's supposed to be a unit-test module. Of course, binaryheap_free is quite trivial and we do already have coverage of it elsewhere. So I'm not super excited about the omission. regards, tom lane
Re: RFC: extensible planner state
Hmm. I don't have a copy of Andrei's email in my gmail. I see it in the archives but I have not got it. I don't understand how that happened. I now wonder if there are other emails from Andrei I haven't received. On Fri, Sep 12, 2025 at 4:34 PM Melanie Plageman wrote: > On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov wrote: > > 1. Why is it necessary to maintain the GetExplainExtensionId and > > GetPlannerExtensionId routines? It seems that using a single > > extension_id (related to the order of the library inside the > > file_scanner) is more transparent and more straightforward if necessary. > > But this wouldn't work for in-core use cases like GEQO, right? Also, > how would it work if there are multiple "extensions" in the same .so > file? We probably don't want to all extensions on any topic to be allocating extension IDs from the same space, because it's used as a list index and we don't want to have to null-pad lists excessively. Combining the explain and planner cases wouldn't be too much of a stretch, perhaps, but it's also not really costing us anything to have separate IDs for those cases. > > 2. Why does the extensibility approach in 0001 differ from that in 0004? > > I can imagine it is all about limiting extensions, but anyway, a module > > has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks > > a little redundant, doesn't it? > > What do you mean that the extensibility approach differs? Like that > the type of extension_state is different? I suspect the question here is about why not use the index-by-planner-extension-ID approach for 0004. That could maybe work, but here everything has to be a Node, so I feel like it would be more contorted than the existing cases. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Proposal: Conflict log history table for Logical Replication
Hi, On Wed, Sep 10, 2025 at 8:13 PM Amit Kapila wrote: > > > How about streaming the conflicts in fixed format to a separate log > > file other than regular postgres server log file? > > I would prefer this info to be stored in tables as it would be easy to > query them. If we use separate LOGs then we should provide some views > to query the LOG. Providing views to query the conflicts LOG is the easiest way than having tables (Probably we must provide both - logging conflicts to tables and separate LOG files). However, wanting the conflicts logs after failovers is something that makes me think the table approach is better. I'm open to more thoughts here. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Proposal: Conflict log history table for Logical Replication
Hi, On Fri, Sep 12, 2025 at 3:13 AM Dilip Kumar wrote: > > I was looking into another thread where we provide an error table for > COPY [1], it requires the user to pre-create the error table. And > inside the COPY command we will validate the table, validation in that > context is a one-time process checking for: (1) table existence, (2) > ability to acquire a sufficient lock, (3) INSERT privileges, and (4) > matching column names and data types. This approach avoids concerns > about the user's DROP or ALTER permissions. > > Our requirement for the logical replication conflict log table > differs, as we must validate the target table upon every conflict > insertion, not just at subscription creation. A more robust > alternative is to perform validation and acquire a lock on the > conflict table whenever the subscription worker starts. This prevents > modifications (like ALTER or DROP) while the worker is active. When > the worker gets restarted, we can re-validate the table and > automatically disable the conflict logging feature if validation > fails. And this can be enabled by ALTER SUBSCRIPTION by setting the > option again. Having to worry about ALTER/DROP and adding code to protect seems like an overkill. > And if we want in first version we can expect user to create the table > as per the expected schema and supply it, this will avoid the need of > handling how to avoid it from publishing as it will be user's > responsibility and then in top up patches we can also allow to create > the table internally if tables doesn't exist and then we can find out > solution to avoid it from being publish when ALL TABLES are published. This looks much more simple to start with. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
> Only question is if we should avoid the extra portal hashtable lookup as > well, or leave that for a separate patch. I prefer a separate thread for this, as it's an optimization of the existing behavior. -- Sami Imseih Amazon Web Services (AWS)
Re: OAuth client code doesn't work with Google OAuth
> I hear what you're saying, but I don't see how it relates to the > paragraph you replied to. I realized something incorrect in my last email: I wrote that google doesn't ask permission for the email/profile scope. That's not true, it didn't show the scope screen because I already gave permission for the application and it remembered it. So that made me wonder: how does Okta handle it? And turns out it's the same. 1. I defined a custom scope in Okta, with required explicit consent from the user 2. I logged in for the first time with psql 3. As expected, Okta showed the consent screen and made me explicitly accept that I'm granting "testScope" 4. I logged in for the second time with psql 5. I entered the code to the okta device login page, and the next screen immediately was "Device activated" Not even a single page of warning, nothing about scopes/permissions/etc, it didn't even verify/tell me which application I'm logging into. The first page just said "activate your device" (It couldn't know where I want to login, as that's the generic enter your code page), and the next page was: "Device activated Follow the instructions on your device for next steps" That was all the text on it. It still didn't even tell me which device (application) I logged into. Just that I succeeded. There's a "prompt=consent" parameter to the API, and a similar setting in the admin interface, but it doesn't seem to do anything with device flow. So in this sense, this is much worse than Google/Azure... More searching tells me that typically oidc providers work this way, and seems like the RFC again doesn't specify when and how often the providers have to show the consent screen. RFC 6819 warns against this, but that's all I could find, and that's a separate RFC. This makes me question again: how does the scope help pg security in practice? (I know that it's the oidc provider's fault, but if all providers are like this, how does that help in practice?) Also, the internet(/android/etc) is full of "login with google" buttons without special scopes, and generally, if somebody logs in to site/app X with google, he/she can expect to modify things within that app. I'm not saying that this is ideal, but this is how it works in practice, and how people are used to it. Anyway, I also understand why you don't agree with this, so if you don't want to include google specific handling, I understand, I won't argue more for it. > 2) I don't yet understand how this helps you as a validator author, > because you shouldn't know what flow was in use. Forget about device > flow for a minute. How does this new information help you if the > organization you're installed in is using only PKCE-flow libpq from > Postgres 25 or whatever? Yes, as a validator author, I don't know if somebody is using libpq, a fork or libpq, or a completely different implementation of the protocol (that does or does not support the not entirely compliant google oauth). But the administrator configuring the Postgres instance / the validator should be aware of how the authentication flow is configured, I wouldn't want to restrict the options by saying that something is not supported, especially not blocking one of the most popular services. > How does this new information help you if the > organization you're installed in is using only PKCE-flow libpq from > Postgres 25 or whatever? > Is that still true for PKCE? In general, we're not running > confidential clients here; it'd be a lot easier if we were. I've been thinking about this for the last few days, but shouldn't a proper PKCE implementation require a protocol change, moving part of the logic to the server side? And that would solve these scenarios we are talking about, there would be no question who and how created the access token. Naive PKCE support only on the client side, and still only sending an access token to the server wouldn't help the security of the server too much. On Thu, Sep 11, 2025 at 4:55 PM Jacob Champion wrote: > > On Thu, Sep 11, 2025 at 12:42 AM Zsolt Parragi > wrote: > > It just says that: > > > > > Once you enter the code displayed on your app or device, it will have > > > access to your account. > > > Do not enter codes from sources you don't trust. > > > > And that's all. And not loudly, this isn't written with a big font > > size with a warning sign or anything like that, it's just quietly > > mentioned. > > YMMV; on my screen the last sentence is in bolded font and quite > front-and-center, but they haven't used tags or red boxes or > anything. In any case I agree it's not enough. > > But I don't have control over them. If a hypothetical provider (and I > am not saying Entra does this, just to be very clear) were to skip the > consent screen entirely, and immediately grant all requested scopes, > neither the client nor the validator would be able to figure that out. > That would be a provider-side problem, and we're trusting the provider > not to do that, or else to com
Re: Set log_lock_waits=on by default
On Fri, 2025-09-12 at 08:06 +0200, Peter Eisentraut wrote: > committed Thank you! Laurenz Albe