Re: Virtual generated columns
On Fri, May 16, 2025 at 1:00 PM Alexander Lakhin wrote: > I've discovered yet another way to trigger that error: > create table vt (a int, b int generated always as (a * 2), c int); > insert into vt values(1); > alter table vt alter column c type bigint using b + c; > > ERROR: XX000: unexpected virtual generated column reference > LOCATION: CheckVarSlotCompatibility, execExprInterp.c:2410 Thank you for the report. It seems that we fail to expand references to virtual generated columns in the NewColumnValues expression when altering tables. We might be able to fix it by: @@ -6203,7 +6203,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap) NewColumnValue *ex = lfirst(l); /* expr already planned */ - ex->exprstate = ExecInitExpr((Expr *) ex->expr, NULL); + ex->exprstate = ExecInitExpr((Expr *) expand_generated_columns_in_expr((Node *) ex->expr, oldrel, 1), NULL); Thanks Richard
Re: Add pg_buffercache_mark_dirty[_all] functions to the pg_buffercache
Hey, I noticed a couple of small clarity issues in the current version of patch for potential clean up: 1. Commit message wording Right now it says: “The pg_buffercache_mark_dirty_all() function provides an efficient way to dirty the entire buffer pool (e.g., ~550 ms vs. ~70 ms for 16 GB of shared buffers), complementing pg_buffercache_mark_dirty() for more granular control.” That makes it sound like the *_all* function is the granular one, when really: • pg_buffercache_mark_dirty(buffernumber) is the fine-grained, per-buffer call. • pg_buffercache_mark_dirty_all() is the bulk, coarse-grained operation. How about rephrasing to: “The pg_buffercache_mark_dirty_all() function provides an efficient, bulk way to mark every buffer dirty (e.g., ~70 ms vs. ~550 ms for 16 GB of shared buffers), while pg_buffercache_mark_dirty() still allows per-buffer, granular control.” 2. Inline comment in MarkUnpinnedBufferDirty We currently have: PinBuffer_Locked(desc); */* releases spinlock */* Folks who’re unfamiliar with this function might get confused. Maybe we could use the one in GetVictimBuffer: */* Pin the buffer and then release its spinlock */* PinBuffer_Locked(buf_hdr); That spelling-out makes it obvious what’s happening. > Since that patch is targeted for the PG 19, pg_buffercache is bumped to > v1.7. > > Latest version is attached and people who already reviewed the patches are > CCed. > >
Re: Virtual generated columns
On Fri, May 16, 2025 at 3:26 PM Richard Guo wrote: > > On Fri, May 16, 2025 at 1:00 PM Alexander Lakhin wrote: > > I've discovered yet another way to trigger that error: > > create table vt (a int, b int generated always as (a * 2), c int); > > insert into vt values(1); > > alter table vt alter column c type bigint using b + c; > > > > ERROR: XX000: unexpected virtual generated column reference > > LOCATION: CheckVarSlotCompatibility, execExprInterp.c:2410 > > Thank you for the report. It seems that we fail to expand references > to virtual generated columns in the NewColumnValues expression when > altering tables. We might be able to fix it by: > > @@ -6203,7 +6203,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap) > NewColumnValue *ex = lfirst(l); > > /* expr already planned */ > - ex->exprstate = ExecInitExpr((Expr *) ex->expr, NULL); > + ex->exprstate = ExecInitExpr((Expr *) > expand_generated_columns_in_expr((Node *) ex->expr, oldrel, 1), NULL); > we have used the USING expression in ATPrepAlterColumnType, ATColumnChangeRequiresRewrite. expanding it on ATPrepAlterColumnType seems to make more sense? @@ -14467,7 +14467,7 @@ ATPrepAlterColumnType(List **wqueue, */ newval = (NewColumnValue *) palloc0(sizeof(NewColumnValue)); newval->attnum = attnum; - newval->expr = (Expr *) transform; + newval->expr = (Expr *) expand_generated_columns_in_expr(transform, rel, 1); newval->is_generated = false;
Re: Backward movement of confirmed_flush resulting in data duplication.
Hi, On Tue, May 13, 2025 at 3:48 PM shveta malik wrote: > > > With the given script, the problem reproduces on Head and PG17. We are > trying to reproduce the issue on PG16 and below where injection points > are not there. > The issue can also be reproduced on PostgreSQL versions 13 through 16. The same steps shared earlier in the 'reproduce_data_duplicate_without_twophase.sh' script can be used to reproduce the issue on versions PG14 to PG16. Since back branches do not support injection points, you can add infinite loops at the locations where the patch 'v1-0001-Injection-points-to-reproduce-the-confirmed_flush.patch introduces injection points'. These loops allow holding and releasing processes using a debugger when needed. Attached are detailed documents describing the reproduction steps: 1) Use 'reproduce_steps_for_pg14_to_16.txt' for PG14 to PG16. 2) Use 'reproduce_steps_for_pg13.txt' for PG13. Note: PG13 uses temporary replication slots for tablesync workers, unlike later versions that use permanent slots. Because of this difference, some debugger-related steps differ slightly in PG13, which is why a separate document is provided for it. -- Thanks, Nisha Below are the detailed steps to follow to reproduce the issue on PG14 to PG16 versions using debugger: (Note: Since these steps are intended to be run manually, short delays like sleep 1 between steps are assumed and not explicitly mentioned. Any wait time longer than one second is explicitly called out.) 1. Set up the primary and subscriber nodes with the same configurations as shared in reproduce_data_duplicate_without_twophase.sh. (The script can be used to do the initial setup) 2. On Primary: Create table tab1, insert a value and create a publication psql -d postgres -p $port_primary -c "CREATE TABLE tab1(a int); INSERT INTO tab1 VALUES(1); CREATE PUBLICATION pub FOR TABLE tab1;" 3. On Subscriber: Create the same table tab1 psql -d postgres -p $port_subscriber -c "CREATE TABLE tab1(a int);" 4. On Subscriber: Start the subscription with copy_data to false psql -d postgres -p $port_subscriber -c "CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=$port_primary' PUBLICATION pub WITH (slot_name='logicalslot', create_slot=true, copy_data = false, enabled=true)" 5. Primary: Confirm the slot details psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 6. Insert the data into tab1. The apply worker's origin_lsn and slot's confirmed_flush will be advanced to this INSERT lsn (say lsn1) psql -d postgres -p $port_primary -c "INSERT INTO tab1 VALUES(2);" 7. Check both confirmed_flush and origin_lsn values, both values should now match the LSN of the insert above (lsn1). psql -d postgres -p $port_subscriber -c "select * from pg_replication_origin_status where local_id = 1;" psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 8. Add a new table (tab2) to the publication on Primary psql -d postgres -p $port_primary -c "CREATE TABLE tab2 (a int UNIQUE); ALTER PUBLICATION pub ADD TABLE tab2;" 9. Create tab2 on Subscriber psql -d postgres -p $port_subscriber -c "CREATE TABLE tab2 (a int UNIQUE);" 10. Refresh the subscription. It will start tablesync for tab2. psql -d postgres -p $port_subscriber -c "ALTER SUBSCRIPTION sub REFRESH PUBLICATION" 11. Attach debugger to the tablesync worker and hold it just before it sets the state to SUBREL_STATE_SYNCWAIT. 12. On Primary: Insert a row into tab2. Lets say the remote lsn for this change is lsn2. psql -d postgres -p $port_primary -c "INSERT INTO tab2 VALUES(2);" 13. Wait for 3+ seconds. The above insert will not be consumed by tablesync worker on sub yet. Apply worker will see this change and will ignore it. 14. Check that confirmed_flush has moved to lsn2 now (where lsn2 > lsn1 ) due to keepalive message handling in apply worker. And origin_lsn remains unchanged. psql -d postgres -p $port_subscriber -c "select * from pg_replication_origin_status where local_id = 1;" psql -d postgres -p $port_primary -c "SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='logicalslot'" 15. Attach another debugger to apply-worker process and hold it just before maybe_reread_subscription() call. 16. Release the tablesync worker from debugger - It will now move to SUBREL_STATE_SYNCWAIT state and will wait for apply worker to move to SUBREL_STATE_CATCHUP. 17. Disable the sub psql -d postgres -p $port_subscriber -c "alter subscription sub disable;" 18. Release the apply worker from debugger. - It will re-read subscription and move the state to SUBREL_STATE_CATCHUP. Then will exit due to sub being disabled. Tablesync will also exit here. 19. Enable the subscription again and let the apply worker start. - Tablesync now catchup
wrong query results on bf leafhopper
Hi, I noticed this recent BF failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-05-15%2008%3A10%3A04 === dumping /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/regression.diffs === diff -U3 /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out --- /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out 2025-05-15 08:10:04.211926695 + +++ /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out 2025-05-15 08:18:29.117733601 + @@ -42,7 +42,7 @@ -> Nested Loop (actual rows=1000.00 loops=N) -> Seq Scan on tenk1 t2 (actual rows=1000.00 loops=N) Filter: (unique1 < 1000) - Rows Removed by Filter: 9000 + Rows Removed by Filter: 8982 -> Memoize (actual rows=1.00 loops=N) Cache Key: t2.twenty Cache Mode: logical @@ -178,7 +178,7 @@ -> Nested Loop (actual rows=1000.00 loops=N) -> Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N) Filter: (unique1 < 1000) - Rows Removed by Filter: 9000 + Rows Removed by Filter: 8981 -> Memoize (actual rows=1.00 loops=N) Cache Key: t1.two, t1.twenty Cache Mode: binary For a moment I thought this could be a bug in memoize, but that doesn't actually make sense - the failure isn't in memoize, it's the seqscan. Subsequently I got worried that this is an AIO bug or such causing wrong query results. But there are instances of this error well before AIO was merged. E.g. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-18%2023%3A35%3A04 The same error is also present down to 16. In 15, I saw a potentially related error https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-16%2023%3A43%3A03 There have been other odd things on leafhopper, see e.g.: https://www.postgresql.org/message-id/35d87371-f3ab-42c8-9aac-bb39ab5bd987%40gmail.com https://postgr.es/m/Z4npAKvchWzKfb_r%40paquier.xyz Greetings, Andres Freund
C extension compilation failed while using PG 17.2 on mac m1
Hello, while compiling my c extension with PG 17,2 I am getting postgresql-17.2/pgsql/include/server/port/pg_iovec.h:93:10: error: call to undeclared function 'pwritev'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 93 | return pwritev(fd, iov, iovcnt, offset); I am not using pg_iovec.h or pwrite/pread related functions. My extension compiles fine without any issues on PG 14.3 *PG compilation commands:* ./configure --prefix=$PWD/pgsql --enable-cassert --enable-debug CFLAGS="-ggdb -fno-omit-frame-pointer -O0" --without-icu make clean make -j make install -j *Mac OS:* 15.1.1 Please let me know if you need any other information. Thanks & Regards, Narayana
Re: Align wording on copyright organization
On Fri, 16 May 2025 at 09:12, Daniel Gustafsson wrote: > As was briefly discussed in the developers meeting, and the hallway track, > at > PGConf.dev we have a few variations on the organization wording which > really > should be aligned with the wording defined by -core in pgweb commit > 2d764dbc08. > Thanks Daniel - the patch looks good to me. You'll push this to all active branches, correct? -- Dave Page pgAdmin: https://www.pgadmin.org PostgreSQL: https://www.postgresql.org pgEdge: https://www.pgedge.com
Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
On Fri, May 16, 2025 at 12:12 PM Tom Lane wrote: > That outcome seems entirely horrible to me. If you want to flag the lack > of a commit message somehow, fine, but don't prevent CI from running. Personally I also prefer nudges to gates. Just like people already deprioritize "Waiting on Author" entries a bit, having an obvious "Patch Needs Work" note might gently help newcomers iterate on their first submissions (or even communicate where a patch is in the lifecycle! e.g. a Bugfix entry where the patch is marked incomplete might motivate someone to jump in to fix it). --Jacob
Re: Make wal_receiver_timeout configurable per subscription
On Fri, May 16, 2025 at 9:11 PM Fujii Masao wrote: > Hi, > > When multiple subscribers connect to different publisher servers, > it can be useful to set different wal_receiver_timeout values for > each connection to better detect failures. However, this isn't > currently possible, which limits flexibility in managing subscriptions. > > Hi,+1 for the idea. > > One approach is to add wal_receiver_timeout as a parameter to > CREATE SUBSCRIPTION command, storing it in pg_subscription > so each logical replication worker can use its specific value. > > Another option is to change the wal_receiver_timeout's GUC context > from PGC_SIGHUP to PGC_USERSET. This would allow setting different > values via ALTER ROLE SET command for each subscription owner - > effectively enabling per-subscription configuration. Since this > approach is simpler and likely sufficient, I'd prefer starting with this. > Thought? Both ways LGTM,for starters we can go with changing GUC's context. > BTW, this could be extended in the future to other GUCs used by > logical replication workers, such as wal_retrieve_retry_interval. > > +1 for extending this idea for other GUCs as well. -- Thanks, Srinath Reddy Sadipiralla EDB: https://www.enterprisedb.com/
Re: Statistics Import and Export
Gentle ping on this. --- Hari Krishna Sunder On Wed, May 14, 2025 at 1:30 PM Hari Krishna Sunder wrote: > Thanks Nathan. > Here is the patch with a comment. > > On Wed, May 14, 2025 at 8:53 AM Nathan Bossart > wrote: > >> On Tue, May 13, 2025 at 05:01:02PM -0700, Hari Krishna Sunder wrote: >> > We found a minor issue when testing statistics import with upgrading >> from >> > versions older than v14. (We have VACUUM and ANALYZE disabled) >> > 3d351d916b20534f973eda760cde17d96545d4c4 >> > < >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3d351d916b20534f973eda760cde17d96545d4c4 >> > >> > changed >> > the default value for reltuples from 0 to -1. So when such tables are >> > imported they get the pg13 default of 0 which in pg18 is treated >> > as "vacuumed and seen to be empty" instead of "never yet vacuumed". The >> > planner then proceeds to pick seq scans even if there are indexes for >> these >> > tables. >> > This is a very narrow edge case and the next VACUUM or ANALYZE will fix >> it >> > but the perf of these tables immediately after the upgrade is >> considerably >> > affected. >> >> There was a similar report for vacuumdb's new --missing-stats-only option. >> We fixed that in commit 9879105 by removing the check for reltuples != 0, >> which means that --missing-stats-only will process empty tables. >> >> > Can we instead use -1 if the version is older than 14, and reltuples is >> 0? >> > This will have the unintended consequence of treating a truly empty >> table >> > as "never yet vacuumed", but that should be fine as empty tables are >> going >> > to be fast regardless of the plan picked. >> >> I'm inclined to agree that we should do this. Even if it's much more >> likely that 0 means empty versus not-yet-processed, the one-time cost of >> processing some empty tables doesn't sound too bad. In any case, since >> this only applies to upgrades from > over time. >> >> > PS: This is my first patch, so apologies for any issues with the patch. >> >> It needs a comment, but otherwise it looks generally reasonable to me >> after >> a quick glance. >> >> -- >> nathan >> >
Re: PG 17.2 compilation fails with -std=c11 on mac
I wrote: > We lack that #define, which results in not supplying > the declaration. AFAICT the requirement for this is in the C11 > standard, this is not just Apple doing something weird. > Aside from this compilation error, we're probably failing to use > memset_s on some platforms where it's available, for lack of > the #define in our configure/meson probes. I know how to fix > that in configure, but I'm less clear on how nonstandard probes > work in meson --- any help there? Here's a lightly-tested fix for that (against HEAD, but it likely works on v17 too). regards, tom lane diff --git a/configure b/configure index 275c67ee67c..4f15347cc95 100755 --- a/configure +++ b/configure @@ -15616,7 +15616,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l +for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" @@ -16192,6 +16192,19 @@ cat >>confdefs.h <<_ACEOF #define HAVE_DECL_STRCHRNUL $ac_have_decl _ACEOF +ac_fn_c_check_decl "$LINENO" "memset_s" "ac_cv_have_decl_memset_s" "#define __STDC_WANT_LIB_EXT1__ 1 +#include +" +if test "x$ac_cv_have_decl_memset_s" = xyes; then : + ac_have_decl=1 +else + ac_have_decl=0 +fi + +cat >>confdefs.h <<_ACEOF +#define HAVE_DECL_MEMSET_S $ac_have_decl +_ACEOF + # This is probably only present on macOS, but may as well check always ac_fn_c_check_decl "$LINENO" "F_FULLFSYNC" "ac_cv_have_decl_F_FULLFSYNC" "#include diff --git a/configure.ac b/configure.ac index 7ea91d56adb..4b8335dc613 100644 --- a/configure.ac +++ b/configure.ac @@ -1792,7 +1792,6 @@ AC_CHECK_FUNCS(m4_normalize([ kqueue localeconv_l mbstowcs_l - memset_s posix_fallocate ppoll pthread_is_threaded_np @@ -1838,6 +1837,8 @@ AC_CHECK_DECLS([strlcat, strlcpy, strnlen, strsep, timingsafe_bcmp]) AC_CHECK_DECLS([preadv], [], [], [#include ]) AC_CHECK_DECLS([pwritev], [], [], [#include ]) AC_CHECK_DECLS([strchrnul], [], [], [#include ]) +AC_CHECK_DECLS([memset_s], [], [], [#define __STDC_WANT_LIB_EXT1__ 1 +#include ]) # This is probably only present on macOS, but may as well check always AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include ]) diff --git a/meson.build b/meson.build index 12de5e80c31..d142e3e408b 100644 --- a/meson.build +++ b/meson.build @@ -2654,6 +2654,7 @@ decl_checks += [ ['preadv', 'sys/uio.h'], ['pwritev', 'sys/uio.h'], ['strchrnul', 'string.h'], + ['memset_s', 'string.h', '#define __STDC_WANT_LIB_EXT1__ 1'], ] # Check presence of some optional LLVM functions. @@ -2667,21 +2668,23 @@ endif foreach c : decl_checks func = c.get(0) header = c.get(1) - args = c.get(2, {}) + prologue = c.get(2, '') + args = c.get(3, {}) varname = 'HAVE_DECL_' + func.underscorify().to_upper() found = cc.compiles(''' -#include <@0@> +@0@ +#include <@1@> int main() { -#ifndef @1@ -(void) @1@; +#ifndef @2@ +(void) @2@; #endif return 0; } -'''.format(header, func), +'''.format(prologue, header, func), name: 'test whether @0@ is declared'.format(func), # need to add cflags_warn to get at least # -Werror=unguarded-availability-new if applicable @@ -2880,7 +2883,6 @@ func_checks = [ ['kqueue'], ['localeconv_l'], ['mbstowcs_l'], - ['memset_s'], ['mkdtemp'], ['posix_fadvise'], ['posix_fallocate'], diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index c3cc9fa856d..726a7c1be1f 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -91,6 +91,10 @@ `LLVMCreatePerfJITEventListener', and to 0 if you don't. */ #undef HAVE_DECL_LLVMCREATEPERFJITEVENTLISTENER +/* Define to 1 if you have the declaration of `memset_s', and to 0 if you + don't. */ +#undef HAVE_DECL_MEMSET_S + /* Define to 1 if you have the declaration of `posix_fadvise', and to 0 if you don't. */ #undef HAVE_DECL_POSIX_FADVISE @@ -291,9 +295,6 @@ /* Define to 1 if you have the header file. */ #undef HAVE_MEMORY_H -/* Define to 1 if you have the `memset_s' function. */ -#undef HAVE_MEMSET_S - /* Define to 1 if you have the `mkdtemp' function. */ #undef HAVE_MKDTEMP diff --git a/src/port/explicit_bzero.c b/src/port/explicit_bzero.c index 1d37b119bab..358a692f357 100644 --- a/src/port/explicit_bzero.c +++ b/src/port/explicit_bzero.c @@ -12,9 +12,11 @@ *---
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
On 16/05/2025 15:01, Tom Lane wrote: Aleksander Alekseev writes: If I'm right about the limitations of aggregate functions and SRFs this leaves us the following options: 1. Changing the constraints of aggregate functions or SRFs. However I don't think we want to do it for such a single niche scenario. 2. Custom syntax and a custom node. 3. To give up Seems to me the obvious answer is to extend TABLESAMPLE (or at least, some of the tablesample methods) to allow it to work on a subquery. Isn't this a job for ? Example: SELECT ... FROM ... JOIN ... FETCH SAMPLE FIRST 10 ROWS ONLY Then the nodeLimit could do some sort of reservoir sampling. There are several enhancements to coming down the pipe, this could be one of them. -- Vik Fearing
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Vik Fearing writes: > On 16/05/2025 15:01, Tom Lane wrote: >> Seems to me the obvious answer is to extend TABLESAMPLE (or at least, some >> of the tablesample methods) to allow it to work on a subquery. > Isn't this a job for ? > FETCH SAMPLE FIRST 10 ROWS ONLY How is that an improvement on TABLESAMPLE? Or did the committee forget that they already have that feature? TABLESAMPLE seems strictly better to me here because it affords the opportunity to specify one of several methods, which seems like it would be useful in this context. regards, tom lane
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
On 16/05/2025 23:21, Tom Lane wrote: Vik Fearing writes: On 16/05/2025 15:01, Tom Lane wrote: Seems to me the obvious answer is to extend TABLESAMPLE (or at least, some of the tablesample methods) to allow it to work on a subquery. Isn't this a job for ? FETCH SAMPLE FIRST 10 ROWS ONLY How is that an improvement on TABLESAMPLE? Or did the committee forget that they already have that feature? TABLESAMPLE seems strictly better to me here because it affords the opportunity to specify one of several methods, which seems like it would be useful in this context. TABLESAMPLE is hitched to a which can be basically anything resembling a relation. So it appears the standard already allows this and we just need to implement it. -- Vik Fearing
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
On Fri, May 16, 2025 at 09:01:50AM -0400, Tom Lane wrote: > Seems to me the obvious answer is to extend TABLESAMPLE (or at least, some > of the tablesample methods) to allow it to work on a subquery. The key here is that we need one bit of state between rows: the count of rows seen so far.
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
On Fri, May 16, 2025 at 11:10:49PM +0200, Vik Fearing wrote: > Isn't this a job for ? > > Example: > > SELECT ... > FROM ... JOIN ... > FETCH SAMPLE FIRST 10 ROWS ONLY > > Then the nodeLimit could do some sort of reservoir sampling. The query might return fewer than N rows. What reservoir sampling requires is this bit of state: the count of input rows so far. The only way I know of to keep such state in a SQL query is with a RECURSIVE CTE, but unfortunately that would require unbounded CTE size, and it would require a way to query next rows one per-iteration. Nico --
Foreign key isolation tests
Here are a couple new isolation tests for foreign keys, based on feedback from the Advanced Patch Review session at PGConf.dev. The goal is to show that temporal foreign keys do not violate referential integrity under concurrency. Non-temporal foreign keys use a crosscheck snapshot to avoid this. Temporal foreign keys do the same, but this test gives us more assurance. I noticed that traditional foreign keys didn't have a test either, at least for the case we discussed: one transaction INSERTS a referencing row, while the other DELETEs the referenced row. So I have two patches here: one adding tests for the traditional case; another, the temporal case. There is a final patch improving the comment for some snapshot macros. While fixing a typo I thought I could improve their clarity a bit as well. Yours, -- Paul ~{:-) p...@illuminatedcomputing.com v1-0003-Improve-comment-about-snapshot-macros.patch Description: Binary data v1-0001-Fill-testing-gap-for-possible-referential-integri.patch Description: Binary data v1-0002-Add-test-for-temporal-referential-integrity.patch Description: Binary data
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
On Thu, May 15, 2025 at 02:41:15AM +0300, Aleksander Alekseev wrote: > SELECT t.ts, t.city, t.temperature, h.humidity > FROM temperature AS t > LEFT JOIN LATERAL > ( SELECT * FROM humidity > WHERE city = t.city AND ts <= t.ts > ORDER BY ts DESC LIMIT 1 > ) AS h ON TRUE > WHERE t.ts < '2022-01-05'; > ``` > > One can do `SELECT (the query above) ORDER BY random() LIMIT x` but > this produces an inefficient plan. Alternatively one could create I assume it gathers the whole result and _then_ orders by random(), yes? I think the obvious thing to do is to have the optimizer recognize `ORDER BY random() LIMIT x` and do the reservoir sampling thing as the underlying query produces rows. The engine needs to keep track of how many rows it's seen so far so that with each new row it can use `random() < 1/n` to decide whether to keep the row. > temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl > TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be > suboptimal even if we supported global temporary tables. Speaking of which, what would it take to have global temp tables? > 1. Do you think there might be value in addressing this issue? Yes! I think `ORDER BY random() LIMIT x` is succinct syntax, and semantically it should yield as uniformly distributed a result set as possible even when the size of the underlying query is not knowable a priori. And it should be as optimal as possible considering that the underlying query's result set is in principle -and many useful real use cases- unbounded. And reservoir sampling is a suitable existing technique to get an optimal and uniform random finite sampling of an indefinite size row set. > 2. If yes, how would you suggest addressing it from the UI point of > view - by adding a special syntax, some sort of aggregate function, or > ...? `ORDER BY random() LIMIT x` is a) in the standard, b) semantically what you're asking for, c) succint syntax, d) implementable optimally, therefore IMO `ORDER BY random() LIMIT x` is the correct syntax. Nico --
pg_upgrade ability to create extension from scripts
It's my understanding that right now when you run pg_upgrade it creates the extension from what exists in the to be upgraded databases. Is there a reason why we can't have some sort of switch option that allows the CREATE EXTENSION from the scripts instead of what is loaded in the db. The reason I'm asking is for a couple of reasons. 1) For projects that version their libraries, sometimes the old versioned library does not exist in the new install and not available for the new install. PostGIS project largely satisfied this issue by major versioning our extension, but of cause this would still be an issue for folks going from PostGIS 2.5 to PostGIS 3+. 2) Yes I know this is a no-no but the postgis spatial_ref_sys table is assumed to be an essentially static table, and people build table constraints using ST_Transform. Also as we observed, for some reason the postgis.spatial_ref_sys table is sometimes not created before user data is loaded or is empty and loaded after user data. As discussed here - https://trac.osgeo.org/postgis/ticket/5899 3) People often forget to run upgrade extension scripts after doing a pg_upgrade so are left in an incomplete upgrade state after upgrade. Having the extension be the latest version would largely solve this. There are other issues with run into in the past where a users old extension can not be upgraded cleanly, we've had this recently with BRIN. Note I'm not asking to change the default behavior, just wondering if there are obstacles to giving users an option to install from the scripts instead of what's currently in the database. Thanks, Regina
Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
On Fri, 16 May 2025 at 12:24, Jacob Champion wrote: > > On Fri, May 16, 2025 at 12:12 PM Tom Lane wrote: > > That outcome seems entirely horrible to me. If you want to flag the lack > > of a commit message somehow, fine, but don't prevent CI from running. > > Personally I also prefer nudges to gates. Okay, reasonable feedback. How about we add a CI job that does a "quality check". That's much less strong, as all the other tests will still run, but people would get a failing CI job which tells them that something is wrong. We could also include a pgindent in that CI job.
Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
Jelte Fennema-Nio writes: > Okay, reasonable feedback. How about we add a CI job that does a > "quality check". That's much less strong, as all the other tests will > still run, but people would get a failing CI job which tells them that > something is wrong. We could also include a pgindent in that CI job. WFM regards, tom lane
Re: Add comment explaining why queryid is int64 in pg_stat_statements
Hi Ilia Evdokimov, While it's true that no arithmetic or logical operations are performed on queryid after the assignment, the overflow technically occurs at the point of assignment itself. For example, entry->key.queryid holds the value 12747288675711951805 as a uint64, but after assigning it to queryid (which is an int64), it becomes -5699455397997599811 due to overflow. This conversion is intentional - most likely to match the bigint type of the queryid column in pg_stat_statements. However, without an explicit comment, this can be misleading. A beginner reading this might misinterpret it as an unintentional overflow or bug and raise unnecessary concerns. Therefore, it’s worth adding a brief comment clarifying the intent behind this assignment. Thanks & Regards, Shaik Mohammad Mujeeb Member Technical Staff Zoho Corp On Fri, 16 May 2025 15:12:41 +0530 Ilia Evdokimov wrote --- On 15.05.2025 10:08, Shaik Mohammad Mujeeb wrote: I don't think the comment is necessary here. There are no arithmetic or logical operations performed on it. It is only passed as a Datum. -- Best regards, Ilia Evdokimov, Tantor Labs LLC. Hi Developers, In pg_stat_statements.c, the function pg_stat_statements_internal() declares the queryid variable as int64, but assigns it a value of type uint64. At first glance, this might appear to be an overflow issue. However, I think this is intentional - the queryid is cast to int64 to match the bigint type of the queryid column in the pg_stat_statements view. Please find the attached patch, which adds a clarifying comment to make the rationale explicit and avoid potential confusion for future readers. Thanks and Regards, Shaik Mohammad Mujeeb Member Technical Staff Zoho Corp
Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
On Fri, 16 May 2025 at 12:05, Daniel Gustafsson wrote: > > > On 16 May 2025, at 11:52, Jelte Fennema-Nio wrote: > > > Does anyone have strong opposition to this? To be clear, it means we don't > > run CI on patches created by piping "git diff" to a file anymore, as a way > > to nudge submitters into providing useful commit messages. > > Disclaimer: I wasn't in the session (due to conflicting interesting sessions) > so I might be raising points/questions already answered. > > Is this really lowering the bar for new contributors? I've always held "be > liberal in what you accept" as a gold standard for projects I'm involved in, > to > remove barriers to entry. Good commit messages are obviously very important, > but having your patch rejected (yes, I know, failing to apply) might not be > strongest motivator for achieving this. Lowering the bar for new contributors wasn't the purpose of this change in policy. It's meant to reduce the work that committers and reviewers have to do, which then in turn would result in quicker reviews/commits. In my experience with other open source projects new contributors are usually fine with adhering to project standards, if they are told what those standards are. e.g. these days basically every popular open source project is running a CI job that fails if the auto-formatter fails.
Re: Conflict detection for update_deleted in logical replication
On Fri, May 16, 2025 at 12:01 PM shveta malik wrote: > > On Fri, May 16, 2025 at 11:15 AM Amit Kapila wrote: > > > > > > BTW, another related point is that when we decide to stop retaining > > dead tuples (via should_stop_conflict_info_retention), should we also > > consider the case that the apply worker didn't even try to get the > > publisher status because previously it decided that > > oldest_nonremovable_xid cannot be advanced due to its > > OldestActiveTransactionId? > > > > Do you mean avoid stop-conflict-retention in such a case as apply > worker itself did not request status from the publisher? If I > understood your point correctly, then we can do that by advancing the > timer to a new value even if we did not update candidate-xid and did > not ask the status from the publisher. > But candidate_xid_time is also used in wait_for_local_flush() to check clock_skew between publisher and subscriber, so for that purpose, it is better to set it along with candidate_xid. However, can't we rely on the valid value of candidate_xid to ensure that apply worker didn't send any request? Note that we always reset candidate_xid once we have updated oldest_nonremovable_xid. -- With Regards, Amit Kapila.
Re: Add comment explaining why queryid is int64 in pg_stat_statements
On 15.05.2025 10:08, Shaik Mohammad Mujeeb wrote: Hi Developers, In pg_stat_statements.c, the function /pg_stat_statements_internal()/ declares the /queryid/ variable as *int64*, but assigns it a value of type *uint64*. At first glance, this might appear to be an overflow issue. However, I think this is intentional - the queryid is cast to /int64/ *to match the bigint type of the queryid column in the pg_stat_statements view*. Please find the attached patch, which adds a clarifying comment to make the rationale explicit and avoid potential confusion for future readers. Thanks and Regards, Shaik Mohammad Mujeeb Member Technical Staff Zoho Corp I don't think the comment is necessary here. There are no arithmetic or logical operations performed on it. It is only passed as a Datum. -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
PG 17.2 compilation fails with -std=c11 on mac
Hello, When I trying to compiling postgres 17.2 with -std=c11 I am getting the below error on mac explicit_bzero.c:22:9: error: call to undeclared function 'memset_s'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 22 | (void) memset_s(buf, len, 0, len); *PG compilation commands:* ./configure --prefix=$PWD/pgsql --enable-cassert --enable-debug CFLAGS="-ggdb -fno-omit-frame-pointer -O0 -std=c11" --without-icu make -s clean make -s -j make install -j *Mac OS:* 15.1.1 *Temporary fix: *without *-std=c11 *or replacing *memset_s* with *memset* in *explicit_bzero* is working *The same issue was reported in 16.2:* https://www.postgresql.org/message-id/flat/CAJ4xhPmqZuYb2ydsKqkfm7wSnmVMD2cqQ%2By51qhTWkb-SfVG-w%40mail.gmail.com Thanks & Regards Narayana
Re: Align wording on copyright organization
Dave Page writes: > On Fri, 16 May 2025 at 09:12, Daniel Gustafsson wrote: >> As was briefly discussed in the developers meeting, and the hallway track, >> at >> PGConf.dev we have a few variations on the organization wording which >> really >> should be aligned with the wording defined by -core in pgweb commit >> 2d764dbc08. > Thanks Daniel - the patch looks good to me. > You'll push this to all active branches, correct? +1 --- thanks Daniel, I had this on my to-do list also. regards, tom lane
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Aleksander Alekseev writes: > If I'm right about the limitations of aggregate functions and SRFs > this leaves us the following options: > 1. Changing the constraints of aggregate functions or SRFs. However I > don't think we want to do it for such a single niche scenario. > 2. Custom syntax and a custom node. > 3. To give up Seems to me the obvious answer is to extend TABLESAMPLE (or at least, some of the tablesample methods) to allow it to work on a subquery. regards, tom lane
Align wording on copyright organization
As was briefly discussed in the developers meeting, and the hallway track, at PGConf.dev we have a few variations on the organization wording which really should be aligned with the wording defined by -core in pgweb commit 2d764dbc08. -- Daniel Gustafsson v1-0001-Align-organization-wording-in-copyright-statement.patch Description: Binary data
Document default values for pgoutput options + fix missing initialization for "origin"
Hi, The pgoutput plugin options are documented in the logical streaming replication protocol, but their default values are not mentioned. This can be inconvenient for users - for example, when using pg_recvlogical with pgoutput plugin and needing to know the default behavior of each option. https://www.postgresql.org/docs/devel/protocol-logical-replication.html I'd like to propose adding the default values to the documentation to improve clarity and usability. Patch attached (0001 patch). While working on this, I also noticed that although most optional parameters (like "binary") are explicitly initialized in parse_output_parameters(), the "origin" parameter is not. Its value (PGOutputData->publish_no_origin) is implicitly set to false due to zero-initialization, but unlike other parameters, it lacks an explicit default assignment. To ensure consistency and make the behavior clearer, I propose explicitly initializing the "origin" parameter as well. A patch for this is also attached (0002 patch). Thoughts? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION From 60b83897c786c89a1060d9c41adced566b7189b5 Mon Sep 17 00:00:00 2001 From: Fujii Masao Date: Fri, 16 May 2025 23:09:55 +0900 Subject: [PATCH v1 1/2] doc: Document default values for pgoutput options in protocol.sgml. The pgoutput plugin options are described in the logical streaming replication protocol documentation, but their default values were previously not mentioned. This made it less convenient for users, for example, when specifying those options to use pg_recvlogical with pgoutput plugin. This commit adds the explanations of the default values for pgoutput options to improve clarity and usability. --- doc/src/sgml/protocol.sgml | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index c4d3853cbf2..b9fac360fbf 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -3482,6 +3482,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" Boolean option to use binary transfer mode. Binary mode is faster than the text mode but slightly less robust. + The default is false. @@ -3494,6 +3495,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" Boolean option to enable sending the messages that are written by pg_logical_emit_message. + The default is false. @@ -3505,10 +3507,11 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" Boolean option to enable streaming of in-progress transactions. - It accepts an additional value "parallel" to enable sending extra + It accepts an additional value parallel to enable sending extra information with some messages to be used for parallelisation. Minimum protocol version 2 is required to turn it on. Minimum protocol - version 4 is required for the "parallel" option. + version 4 is required for the parallel option. + The default is false. @@ -3521,6 +3524,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" Boolean option to enable two-phase transactions. Minimum protocol version 3 is required to turn it on. + The default is false. @@ -3537,6 +3541,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" to send the changes regardless of their origin. This can be used to avoid loops (infinite replication of the same data) among replication nodes. + The default is any. -- 2.49.0 From 762396c608892a90a6943f9f8bfa017d289075a9 Mon Sep 17 00:00:00 2001 From: Fujii Masao Date: Fri, 16 May 2025 23:53:08 +0900 Subject: [PATCH v1 2/2] pgoutput: Initialize missing default for "origin" parameter. The pgoutput plugin initializes optional parameters like "binary" with default values at the start of processing. However, the "origin" parameter was previously missed and left without explicit initialization. Although the PGOutputData struct, which holds these settings, is zero-initialized at allocation (resulting in publish_no_origin field for "origin" parameter being false by default), this default was not set explicitly, unlike other parameters. This commit adds explicit initialization of the "origin" parameter to ensure consistency and clarity in how defaults are handled. --- src/backend/replication/pgoutput/pgoutput.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 693a766e6d7..d7099c060d9 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -297,10 +297,12 @@ parse_output_parameters(List *options, PGOutput
Re: Add comment explaining why queryid is int64 in pg_stat_statements
Hi Shaik > While it's true that no arithmetic or logical operations are performed on queryid after the assignment, the overflow technically > occurs at the point of assignment itself. For example, *entry->key.queryid* holds the value *12747288675711951805* as a > *uint64*, but after assigning it to *queryid* (which is an *int64*), it becomes *-5699455397997599811* *due to overflow*. > This conversion is intentional - most likely to match the *bigint* type of the *queryid* column in *pg_stat_statements*. However, > without an explicit comment, this can be misleading. A beginner reading this might misinterpret it as an unintentional overflow > or bug and raise unnecessary concerns. Therefore, it’s worth adding a brief comment clarifying the intent behind this > assignment. +1 agree On Sat, May 17, 2025 at 1:54 AM Shaik Mohammad Mujeeb < mujeeb...@zohocorp.com> wrote: > Hi Ilia Evdokimov, > > While it's true that no arithmetic or logical operations are performed on > queryid after the assignment, the overflow technically occurs at the > point of assignment itself. For example, *entry->key.queryid* holds the > value *12747288675711951805* as a *uint64*, but after assigning it to > *queryid* (which is an *int64*), it becomes *-5699455397997599811* *due > to overflow*. > > This conversion is intentional - most likely to match the *bigint* type > of the *queryid* column in *pg_stat_statements*. However, without an > explicit comment, this can be misleading. A beginner reading this might > misinterpret it as an unintentional overflow or bug and raise unnecessary > concerns. Therefore, it’s worth adding a brief comment clarifying the > intent behind this assignment. > > > > Thanks & Regards, > Shaik Mohammad Mujeeb > Member Technical Staff > Zoho Corp > > > > > > On Fri, 16 May 2025 15:12:41 +0530 *Ilia Evdokimov > >* wrote --- > > > On 15.05.2025 10:08, Shaik Mohammad Mujeeb wrote: > > > I don't think the comment is necessary here. There are no arithmetic or > logical operations performed on it. It is only passed as a Datum. > > -- > Best regards, > Ilia Evdokimov, > Tantor Labs LLC. > > Hi Developers, > > In pg_stat_statements.c, the function *pg_stat_statements_internal()* > declares the *queryid* variable as *int64*, but assigns it a value of > type *uint64*. At first glance, this might appear to be an overflow > issue. However, I think this is intentional - the queryid is cast to > *int64* *to match the bigint type of the queryid column in the > pg_stat_statements view*. > > Please find the attached patch, which adds a clarifying comment to make > the rationale explicit and avoid potential confusion for future readers. > > > > Thanks and Regards, > Shaik Mohammad Mujeeb > Member Technical Staff > Zoho Corp > > > > >
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Hi, > A custom SRF seems great to me. You may propose such an aggregate in the > core - it seems it doesn't even need any syntax changes. For example: > SELECT * FROM (SELECT sample(q, 10, ) FROM (SELECT ...) AS q); > or something like that. I experimented with this idea a bit and it looks like this is not going to work. In our case sample() should behave both as an aggregate function and as SRF. An aggregate function computes a single result [1]. We probably could bypass this by returning an array but it would be inconvenient for the user. The problem with SRFs is that we never know when SRF is called the last time e.g. there is no SRF_IS_LASTCALL(). So there is no way to aggregate the tuples until we receive the last one and then return the sample. I don't think SRF can know this in a general case (JOINs, SELECT .. LIMIT .., etc). Unless I'm missing something of course. Perhaps somebody smarter than me could show us the exact way to implement sample() as an SRF, but at the moment personally I don't see it. On top of that it's worth noting that the query you showed above wouldn't be such a great UI in my opinion. If I'm right about the limitations of aggregate functions and SRFs this leaves us the following options: 1. Changing the constraints of aggregate functions or SRFs. However I don't think we want to do it for such a single niche scenario. 2. Custom syntax and a custom node. 3. To give up Thoughts? [1]: https://www.postgresql.org/docs/current/functions-aggregate.html -- Best regards, Aleksander Alekseev
Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables
On 2025/05/16 15:33, Dilip Kumar wrote: On Fri, May 16, 2025 at 11:51 AM Masahiro Ikeda wrote: Thank you for your comments! I updated the patch to use RELKIND_HAS_STORAGE() as done in commit 4623d7144. Please see the v2-0001 patch for the changes. Thanks for updating the patch! You added a test for pg_prewarm() with a partitioned table in the TAP tests. But, wouldn't it be simpler and easier to maintain if we added this as a regular regression test (i.e., using .sql and .out files) instead of TAP tests? That should be sufficient for this case? Also, since the issue was introduced in v17, this patch should be back-patched to v17, right? On 2025-05-15 19:57, Richard Guo wrote: +1. FWIW, not long ago we fixed a similar Assert failure in contrib/pg_freespacemap by verifying RELKIND_HAS_STORAGE() before trying to access the storage (see 4623d7144). Wondering if there are other similar issues elsewhere in contrib ... I tested all contrib functions that take regclass arguments (see attached test.sql and test_result.txt). The result shows that only pg_prewarm() can lead to the assertion failure. Right, even while I was searching, I see this is used in 3 contrib modules, amcheck, autoprewarm, and pg_prewarm. amcheck is already checking for AM type, and Autoprewarm is identifying the relation by block, so there is no chance of getting the relation which do not have storage. However, I found that amcheck's error messages can be misleading when run on partitioned indexes. For example, on the master branch, amcheck (e.g., bt_index_check function) shows this error: > ERROR: expected "btree" index as targets for verification > DETAIL: Relation "pgbench_accounts_pkey" is a btree index. This message says it expects a "btree" index, yet also states the relation is a "btree" index, which can seem contradictory. The actual issue is that the index is a btree partitioned index, but this detail is missing, causing possible confusion. Yes, I found the error message confusing during testing as well, so it makes sense to improve it. +1 Regarding the patch, how about also adding a regression test for amcheck with a partitioned index? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Conflict detection for update_deleted in logical replication
On Thu, May 15, 2025 at 6:02 PM Amit Kapila wrote: > > Few minor comments on 0001: > 1. > + if (TimestampDifferenceExceeds(data->reply_time, > +data->candidate_xid_time, 0)) > + ereport(ERROR, > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("oldest_nonremovable_xid transaction ID may be advanced > prematurely"), > > Shouldn't this be elog as this is an internal message? And, instead of > "... may be ..", shall we use ".. could be .." in the error message as > the oldest_nonremovable_xid is not yet advanced by this time. > > 2. > + * It's necessary to use FullTransactionId here to mitigate potential race > + * conditions. Such scenarios might occur if the replication slot is not > + * yet created by the launcher while the apply worker has already > + * initialized this field. > > IIRC, we discussed why it isn't easy to close this race condition. Can > we capture that in comments separately? > A few more comments: = 3. maybe_advance_nonremovable_xid(RetainConflictInfoData *data, bool status_received) { /* Exit early if retaining conflict information is not required */ if (!MySubscription->retainconflictinfo) return; /* * It is sufficient to manage non-removable transaction ID for a * subscription by the main apply worker to detect update_deleted conflict * even for table sync or parallel apply workers. */ if (!am_leader_apply_worker()) return; /* Exit early if we have already stopped retaining */ if (MyLogicalRepWorker->stop_conflict_info_retention) return; ... get_candidate_xid() { ... if (!TimestampDifferenceExceeds(data->candidate_xid_time, now, data->xid_advance_interval)) return; Would it be better to encapsulate all these preliminary checks that decide whether we can move to computing oldest_nonremovable_xid in a separate function? The check in get_candidate_xid would require some additional conditions because it is not required in every phase. Additionally, we can move the core phase processing logic to compute in a separate function. We can try this to see if the code looks better with such a refactoring. 4. + /* + * Check if all remote concurrent transactions that were active at the + * first status request have now completed. If completed, proceed to the + * next phase; otherwise, continue checking the publisher status until + * these transactions finish. + */ + if (FullTransactionIdPrecedesOrEquals(data->last_phase_at, + remote_full_xid)) + data->phase = RCI_WAIT_FOR_LOCAL_FLUSH; I think there is a possibility of optimization here for cases where there are no new transactions on the publisher side across the next cycle of advancement of oldest_nonremovable_xid. We can simply set candidate_xid as oldest_nonremovable_xid instead of going into RCI_WAIT_FOR_LOCAL_FLUSH phase. If you want to keep the code simple for the first version, then at least note that down in comments, but OTOH, if it is simple to achieve, then let's do it now. -- With Regards, Amit Kapila.
Make wal_receiver_timeout configurable per subscription
Hi, When multiple subscribers connect to different publisher servers, it can be useful to set different wal_receiver_timeout values for each connection to better detect failures. However, this isn't currently possible, which limits flexibility in managing subscriptions. To address this, I'd like to propose making wal_receiver_timeout configurable per subscription. One approach is to add wal_receiver_timeout as a parameter to CREATE SUBSCRIPTION command, storing it in pg_subscription so each logical replication worker can use its specific value. Another option is to change the wal_receiver_timeout's GUC context from PGC_SIGHUP to PGC_USERSET. This would allow setting different values via ALTER ROLE SET command for each subscription owner - effectively enabling per-subscription configuration. Since this approach is simpler and likely sufficient, I'd prefer starting with this. Thought? BTW, this could be extended in the future to other GUCs used by logical replication workers, such as wal_retrieve_retry_interval. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Align wording on copyright organization
> On 16 May 2025, at 10:40, Daniel Gustafsson wrote: > I will go ahead and push this to all supported branches. Done. I also ensured that the text in the legal notice match the OSI license text while in there. -- Daniel Gustafsson
Proposal: Make cfbot fail on patches not created by "git format-patch"
In the "Scaling PostgreSQL Development" unconference session. One of the problems that came up was that people don't follow "best practices". The response to that was that people don't know what the best practices are (nor that they are important to follow), because we don't enforce them. Based on the discussion there I'm planning to make the cfbot fail to apply a patch in the following two cases: 1. If a patch is not created by "git format-patch" (but cfbot will still use "patch" to apply the patch in case "git am" fails) 2. If the commit message has no body (so only a title) Does anyone have strong opposition to this? To be clear, it means we don't run CI on patches created by piping "git diff" to a file anymore, as a way to nudge submitters into providing useful commit messages. Communication wise, I plan to show this in the CF app as "Fails apply" (instead of "Needs Rebase"). When clicking the "Fails apply" link, it would then show a log as to why the apply failed. need to be created using git format patch, and should have a descriptive commit message.
Re: PG 17.2 compilation fails with -std=c11 on mac
Lakshmi Narayana Velayudam writes: > When I trying to compiling postgres 17.2 with -std=c11 I am getting the > below error on mac > explicit_bzero.c:22:9: error: call to undeclared function 'memset_s'; ISO > C99 and later do not support implicit function declarations > [-Wimplicit-function-declaration] Yeah, I can reproduce that. After some digging, I see that the problem is explained by "man memset_s": SYNOPSIS #define __STDC_WANT_LIB_EXT1__ 1 #include errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n); We lack that #define, which results in not supplying the declaration. AFAICT the requirement for this is in the C11 standard, this is not just Apple doing something weird. (I did not figure out how come the code compiles without -std=c11.) Aside from this compilation error, we're probably failing to use memset_s on some platforms where it's available, for lack of the #define in our configure/meson probes. I know how to fix that in configure, but I'm less clear on how nonstandard probes work in meson --- any help there? regards, tom lane
Re: Align wording on copyright organization
On Fri, 16 May 2025 at 11:50, Daniel Gustafsson wrote: > > On 16 May 2025, at 10:40, Daniel Gustafsson wrote: > > > I will go ahead and push this to all supported branches. > > Done. I also ensured that the text in the legal notice match the OSI > license > text while in there. > Great, thank you very much. -- Dave Page pgAdmin: https://www.pgadmin.org PostgreSQL: https://www.postgresql.org pgEdge: https://www.pgedge.com
Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
> On 16 May 2025, at 11:52, Jelte Fennema-Nio wrote: > Does anyone have strong opposition to this? To be clear, it means we don't > run CI on patches created by piping "git diff" to a file anymore, as a way > to nudge submitters into providing useful commit messages. Disclaimer: I wasn't in the session (due to conflicting interesting sessions) so I might be raising points/questions already answered. Is this really lowering the bar for new contributors? I've always held "be liberal in what you accept" as a gold standard for projects I'm involved in, to remove barriers to entry. Good commit messages are obviously very important, but having your patch rejected (yes, I know, failing to apply) might not be strongest motivator for achieving this. -- Daniel Gustafsson
Re: C extension compilation failed while using PG 17.2 on mac m1
Lakshmi Narayana Velayudam writes: > Hello, while compiling my c extension with PG 17,2 I am getting > postgresql-17.2/pgsql/include/server/port/pg_iovec.h:93:10: error: call to > undeclared function 'pwritev'; ISO C99 and later do not support implicit > function declarations [-Wimplicit-function-declaration] I failed to reproduce this. I wonder if you have some kind of version skew problem --- SDK not matching underlying system, for instance. preadv and pwritev do have some weirdness on Apple platforms, but we should handle that on any PG >= v15 (cf. commit f014b1b9b). regards, tom lane
Re: wrong query results on bf leafhopper
is there different tables "Seq Scan on tenk1 t2" and "Seq Scan on tenk1 t1", so it might not be a bug, isn't it? On 16.05.2025 09:19, Andres Freund wrote: Hi, I noticed this recent BF failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-05-15%2008%3A10%3A04 === dumping /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/regression.diffs === diff -U3 /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out --- /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out 2025-05-15 08:10:04.211926695 + +++ /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out 2025-05-15 08:18:29.117733601 + @@ -42,7 +42,7 @@ -> Nested Loop (actual rows=1000.00 loops=N) -> Seq Scan on tenk1 t2 (actual rows=1000.00 loops=N) Filter: (unique1 < 1000) - Rows Removed by Filter: 9000 + Rows Removed by Filter: 8982 -> Memoize (actual rows=1.00 loops=N) Cache Key: t2.twenty Cache Mode: logical @@ -178,7 +178,7 @@ -> Nested Loop (actual rows=1000.00 loops=N) -> Seq Scan on tenk1 t1 (actual rows=1000.00 loops=N) Filter: (unique1 < 1000) - Rows Removed by Filter: 9000 + Rows Removed by Filter: 8981 -> Memoize (actual rows=1.00 loops=N) Cache Key: t1.two, t1.twenty Cache Mode: binary For a moment I thought this could be a bug in memoize, but that doesn't actually make sense - the failure isn't in memoize, it's the seqscan. Subsequently I got worried that this is an AIO bug or such causing wrong query results. But there are instances of this error well before AIO was merged. E.g. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-18%2023%3A35%3A04 The same error is also present down to 16. In 15, I saw a potentially related error https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-16%2023%3A43%3A03 There have been other odd things on leafhopper, see e.g.: https://www.postgresql.org/message-id/35d87371-f3ab-42c8-9aac-bb39ab5bd987%40gmail.com https://postgr.es/m/Z4npAKvchWzKfb_r%40paquier.xyz Greetings, Andres Freund -- Regards, Alena Rybakina Postgres Professional
Re: Proposal: Make cfbot fail on patches not created by "git format-patch"
Jelte Fennema-Nio writes: > Based on the discussion there I'm planning to make the cfbot fail to apply > a patch in the following two cases: > ... > To be clear, it means we don't > run CI on patches created by piping "git diff" to a file anymore, as a way > to nudge submitters into providing useful commit messages. That outcome seems entirely horrible to me. If you want to flag the lack of a commit message somehow, fine, but don't prevent CI from running. regards, tom lane
Re: Conflict detection for update_deleted in logical replication
On Fri, May 16, 2025 at 12:17 PM Amit Kapila wrote: > > On Fri, Apr 25, 2025 at 10:08 AM shveta malik wrote: > > > > On Thu, Apr 24, 2025 at 6:11 PM Zhijie Hou (Fujitsu) > > wrote: > > > > > > Few comments for patch004: > > > > Config.sgml: > > > > 1) > > > > + > > > > +Maximum duration (in milliseconds) for which conflict > > > > +information can be retained for conflict detection by the > > > > apply worker. > > > > +The default value is 0, indicating that > > > > conflict > > > > +information is retained until it is no longer needed for > > > > detection > > > > +purposes. > > > > + > > > > > > > > IIUC, the above is not entirely accurate. Suppose the subscriber > > > > manages to > > > > catch up and sets oldest_nonremovable_xid to 100, which is then updated > > > > in > > > > slot. After this, the apply worker takes a nap and begins a new xid > > > > update cycle. > > > > Now, let’s say the next candidate_xid is 200, but this time the > > > > subscriber fails > > > > to keep up and exceeds max_conflict_retention_duration. As a result, it > > > > sets > > > > oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips > > > > updating the slot’s xmin. > > > > > > If the time exceeds the max_conflict_retention_duration, the launcher > > > would > > > Invalidate the slot, instead of skipping updating it. So the conflict > > > info(e.g., > > > dead tuples) would not be retained anymore. > > > > > > > launcher will not invalidate the slot until all subscriptions have > > stopped conflict_info retention. So info of dead tuples for a > > particular oldest_xmin of a particular apply worker could be retained > > for much longer than this configured duration. If other apply workers > > are actively working (catching up with primary), then they should keep > > on advancing xmin of shared slot but if xmin of shared slot remains > > same for say 15min+15min+15min for 3 apply-workers (assuming they are > > marking themselves with stop_conflict_retention one after other and > > xmin of slot has not been advanced), then the first apply worker > > having marked itself with stop_conflict_retention still has access to > > the oldest_xmin's data for 45 mins instead of 15 mins. (where > > max_conflict_retention_duration=15 mins). Please let me know if my > > understanding is wrong. > > > > IIUC, the current code will stop updating the slot even if one of the > apply workers has set stop_conflict_info_retention. The other apply > workers will keep on maintaining their oldest_nonremovable_xid without > advancing the slot. If this is correct, then what behavior instead we > expect here? I think this is not the current behaviour. > Do we want the slot to keep advancing till any worker is > actively maintaining oldest_nonremovable_xid? In fact, this one is the current behaviour of v30 patch. > To some extent, this > matches with the cases where the user has set retain_conflict_info for > some subscriptions but not for others. > > If so, how will users eventually know for which tables they can expect > to reliably detect update_delete? One possibility is that users can > check which apply workers have stopped maintaining > oldest_nonremovable_xid via pg_stat_subscription view and then see the > tables corresponding to those subscriptions. Yes, it is a possibility, but I feel it will be too much to monitor from the user's perspective. > Also, what will we do as > part of the resolutions in the applyworkers where > stop_conflict_info_retention is set? Shall we simply LOG that we can't > resolve this conflict and continue till the user takes some action, or > simply error out in such cases? We can LOG. Erroring out again will prevent the subscriber from proceeding, and the subscriber initially reached this state due to falling behind, which led to stop_conflict_retention=true. But still if we go with erroring out, I am not very sure what action users can take in this situation? Subscriber is still lagging and if the user recreates the slot as a solution, apply worker will soon go to 'stop_conflict_retention=true' state again, provided the subscriber is still not able to catch-up. thanks Shveta
Re: Conflict detection for update_deleted in logical replication
On Fri, May 16, 2025 at 2:40 PM Amit Kapila wrote: > > On Fri, May 16, 2025 at 12:01 PM shveta malik wrote: > > > > On Fri, May 16, 2025 at 11:15 AM Amit Kapila > > wrote: > > > > > > > > > BTW, another related point is that when we decide to stop retaining > > > dead tuples (via should_stop_conflict_info_retention), should we also > > > consider the case that the apply worker didn't even try to get the > > > publisher status because previously it decided that > > > oldest_nonremovable_xid cannot be advanced due to its > > > OldestActiveTransactionId? > > > > > > > Do you mean avoid stop-conflict-retention in such a case as apply > > worker itself did not request status from the publisher? If I > > understood your point correctly, then we can do that by advancing the > > timer to a new value even if we did not update candidate-xid and did > > not ask the status from the publisher. > > > > But candidate_xid_time is also used in wait_for_local_flush() to check > clock_skew between publisher and subscriber, so for that purpose, it > is better to set it along with candidate_xid. However, can't we rely > on the valid value of candidate_xid to ensure that apply worker didn't > send any request? Note that we always reset candidate_xid once we have > updated oldest_nonremovable_xid. > I think this is automatically taken care of because we call should_stop_conflict_info_retention() only during 'wait' phase, which should be done after candidate_xid is set. Having said that, we should have assert for candidate_xid in should_stop_conflict_info_retention() and also add in comments that it should be called only during the 'wait' phase. Additionally, we can also have an assert that should_stop_conflict_info_retention() is called only during the 'wait' phase. -- With Regards, Amit Kapila.
Re: Align wording on copyright organization
> On 16 May 2025, at 10:00, Tom Lane wrote: > > Dave Page writes: >> On Fri, 16 May 2025 at 09:12, Daniel Gustafsson wrote: >>> As was briefly discussed in the developers meeting, and the hallway track, >>> at >>> PGConf.dev we have a few variations on the organization wording which >>> really >>> should be aligned with the wording defined by -core in pgweb commit >>> 2d764dbc08. > >> Thanks Daniel - the patch looks good to me. >> You'll push this to all active branches, correct? > > +1 --- thanks Daniel, I had this on my to-do list also. Thanks for the review, I will go ahead and push this to all supported branches. -- Daniel Gustafsson