Re: Change GUC hashtable to use simplehash?
On Sun, Dec 10, 2023 at 2:18 AM Jeff Davis wrote: > > On Sat, 2023-12-09 at 18:52 +0700, John Naylor wrote: > > > I tested using the new hash function APIs for my search path cache, > > > and > > > there's a significant speedup for cases not benefiting from > > > a86c61c9ee. > > > It's enough that we almost don't need a86c61c9ee. So a definite +1 > > > to > > > the new APIs. > > > > Do you have a new test? > > Still using the same basic test here: > > https://www.postgresql.org/message-id/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com > > What I did is: > >a. add your v5 patches >b. disable optimization in a86c61c9ee >c. add attached patch to use new hash APIs > > I got a slowdown between (a) and (b), and then (c) closed the gap about > halfway. It started to get close to test noise at that point -- I could > get some better numbers out of it if it's helpful. I tried my variant of the same test [1] (but only 20 seconds per run), which uses pgbench to take the average of a few dozen runs, and doesn't use table I/O (when doing that, it's best to pre-warm the buffers to reduce noise). pgbench -n -T 20 -f bench.sql -M prepared (done three times and take the median, with turbo off) * master at 457428d9e99b6b from Dec 4: latency average = 571.413 ms * v8 (bytewise hash): latency average = 588.942 ms This regression is a bit surprising, since there is no strlen call, and it uses roleid as a seed without a round of mixing (not sure if we should do that, but just trying to verify results). * v8 with chunked interface: latency average = 555.688 ms This starts to improve things for me. * v8 with chunked, and return lower 32 bits of full 64-bit hash: latency average = 556.324 ms This is within the noise level. There doesn't seem to be much downside of using a couple cycles for fasthash's 32-bit reduction. * revert back to master from Dec 4 and then cherry pick a86c61c9ee (save last entry of SearchPathCache) latency average = 545.747 ms So chunked incremental hashing gets within ~2% of that, which is nice. It seems we should use that when removing strlen, when convenient. Updated next steps: * Investigate whether/how to incorporate final length into the calculation when we don't have the length up front. * Add some desperately needed explanatory comments. * Use this in some existing cases where it makes sense. * Get back to GUC hash and dynahash. [1] https://www.postgresql.org/message-id/CANWCAZY7Cr-GdUhrCLoR4%2BJGLChTb0pQxq9ZPi1KTLs%2B_KDFqg%40mail.gmail.com
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Mon, Dec 11, 2023 at 3:14 PM Michael Paquier wrote: > > On Mon, Dec 11, 2023 at 11:09:45AM +0530, Dilip Kumar wrote: > > I haven't specifically done a review or testing of this patch, but I > > have used this for testing the CLOG group update code with my > > SLRU-specific changes and I found it quite helpful to test some of the > > concurrent areas where you need to stop processing somewhere in the > > middle of the code and testing that area without this kind of > > injection point framework is really difficult or may not be even > > possible. We wanted to test the case of clog group update where we > > can get multiple processes added to a single group and get the xid > > status updated by the group leader, you can refer to my test in that > > thread[1] (the last patch test_group_commit.patch is using this > > framework for testing). > > Could you be more specific? test_group_commit.patch includes this > line but there is nothing specific about this injection point getting > used in a test or a callback assigned to it: > ./test_group_commit.patch:+ INJECTION_POINT("ClogGroupCommit"); Oops, I only included the code changes where I am adding injection points and some comments to verify that, but missed the actual test file. Attaching it here. Note: I think the latest patches are conflicting with the head, can you rebase? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com # Test clog group update use strict; use warnings; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; my $node = PostgreSQL::Test::Cluster->new('node'); $node->init(allows_streaming => 'logical'); $node->start; $node->safe_psql('postgres', 'CREATE EXTENSION test_injection_points;'); $node->safe_psql('postgres', 'CREATE TABLE test(a int);'); # Consume multiple xids so that next xids get generated in new banks $node->safe_psql( 'postgres', q{ do $$ begin for i in 1..128001 loop -- use an exception block so that each iteration eats an XID begin insert into test values (i); exception when division_by_zero then null; end; end loop; end$$; }); #attach to the injection point $node->safe_psql('postgres', "SELECT test_injection_points_attach('ClogGroupCommit', 'wait');"); # First session will get the slru lock and will wait on injection point my $session1 = $node->background_psql('postgres'); $session1->query_until( qr/start/, q( \echo start INSERT INTO test VALUES(1); )); #create another 4 session which will not get the lock as first session is holding that lock #so these all will go for group update my $session2 = $node->background_psql('postgres'); $session2->query_until( qr/start/, q( \echo start INSERT INTO test VALUES(2); )); my $session3 = $node->background_psql('postgres'); $session3->query_until( qr/start/, q( \echo start INSERT INTO test VALUES(3); )); my $session4 = $node->background_psql('postgres'); $session4->query_until( qr/start/, q( \echo start INSERT INTO test VALUES(4); )); my $session5 = $node->background_psql('postgres'); $session5->query_until( qr/start/, q( \echo start INSERT INTO test VALUES(5); )); # Now wake up the first session and let next 4 session perform the group update $node->safe_psql('postgres', "SELECT test_injection_points_wake();"); $node->safe_psql('postgres', "SELECT test_injection_points_detach('ClogGroupCommit');"); done_testing();
Re: Improve eviction algorithm in ReorderBuffer
On Tue, Dec 12, 2023 at 9:01 AM Masahiko Sawada wrote: > > Hi all, > > As the comment of ReorderBufferLargestTXN() says, it's very slow with > many subtransactions: > > /* > * Find the largest transaction (toplevel or subxact) to evict (spill to > disk). > * > * XXX With many subtransactions this might be quite slow, because we'll have > * to walk through all of them. There are some options how we could improve > * that: (a) maintain some secondary structure with transactions sorted by > * amount of changes, (b) not looking for the entirely largest transaction, > * but e.g. for transaction using at least some fraction of the memory limit, > * and (c) evicting multiple transactions at once, e.g. to free a given > portion > * of the memory limit (e.g. 50%). > */ > > This is because the reorderbuffer has transaction entries for each > top-level and sub transaction, and ReorderBufferLargestTXN() walks > through all transaction entries to pick the transaction to evict. > I've heard the report internally that replication lag became huge when > decoding transactions each consisting of 500k sub transactions. Note > that ReorderBufferLargetstTXN() is used only in non-streaming mode. > > Here is a test script for a many subtransactions scenario. In my > environment, the logical decoding took over 2min to decode one top > transaction having 100k subtransctions. > > - > create table test (c int); > create or replace function testfn (cnt int) returns void as $$ > begin > for i in 1..cnt loop > begin > insert into test values (i); > exception when division_by_zero then > raise notice 'caught error'; > return; > end; > end loop; > end; > $$ > language plpgsql; > select testfn(10) > set logical_decoding_work_mem to '4MB'; > select count(*) from pg_logical_slot_peek_changes('s', null, null) > > > To deal with this problem, I initially thought of the idea (a) > mentioned in the comment; use a binary heap to maintain the > transactions sorted by the amount of changes or the size. But it seems > not a good idea to try maintaining all transactions by its size since > the size of each transaction could be changed frequently. > > The attached patch uses a different approach that consists of three > strategies; (1) maintain the list of transactions whose size is larger > than 10% of logical_decoding_work_mem, and preferentially evict a > transaction from this list. If the list is empty, all transactions are > small enough, (2) so we evict the oldest top-level transaction from > rb->toplevel_by_lsn list. Evicting older transactions would help in > freeing memory blocks in GenerationContext. Finally, if this is also > empty, (3) we evict a transaction that size is > 0. Here, we need to > note the fact that even if a transaction is evicted the > ReorderBufferTXN entry is not removed from rb->by_txn but its size is > 0. In the worst case where all (quite a few) transactions are smaller > than 10% of the memory limit, we might end up checking many > transactions to find non-zero size transaction entries to evict. So > the patch adds a new list to maintain all transactions that have at > least one change in memory. > > Summarizing the algorithm I've implemented in the patch, > > 1. pick a transaction from the list of large transactions (larger than > 10% of memory limit). > 2. pick a transaction from the top-level transaction list in LSN order. > 3. pick a transaction from the list of transactions that have at least > one change in memory. > > With the patch, the above test case completed within 3 seconds in my > environment. Thanks for working on this, I think it would be good to test other scenarios as well where this might have some negative impact and see where we stand. I mean 1) A scenario where suppose you have one very large transaction that is consuming ~40% of the memory and 5-6 comparatively smaller transactions that are just above 10% of the memory limit. And now for coming under the memory limit instead of getting 1 large transaction evicted out, we are evicting out multiple times. 2) Another scenario where all the transactions are under 10% of the memory limit but let's say there are some transactions are consuming around 8-9% of the memory limit each but those are not very old transactions whereas there are certain old transactions which are fairly small and consuming under 1% of memory limit and there are many such transactions. So how it would affect if we frequently select many of these transactions to come under memory limit instead of selecting a couple of large transactions which are consuming 8-9%? > > As a side note, the idea (c) mentioned in the comment, evicting > multiple transactions at once to free a given portion of the memory, > would also help in avoiding back and forth the memory threshold. It's > also worth considering. Yes, I think it is worth considering. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Improve eviction algorithm in ReorderBuffer
Hi all, As the comment of ReorderBufferLargestTXN() says, it's very slow with many subtransactions: /* * Find the largest transaction (toplevel or subxact) to evict (spill to disk). * * XXX With many subtransactions this might be quite slow, because we'll have * to walk through all of them. There are some options how we could improve * that: (a) maintain some secondary structure with transactions sorted by * amount of changes, (b) not looking for the entirely largest transaction, * but e.g. for transaction using at least some fraction of the memory limit, * and (c) evicting multiple transactions at once, e.g. to free a given portion * of the memory limit (e.g. 50%). */ This is because the reorderbuffer has transaction entries for each top-level and sub transaction, and ReorderBufferLargestTXN() walks through all transaction entries to pick the transaction to evict. I've heard the report internally that replication lag became huge when decoding transactions each consisting of 500k sub transactions. Note that ReorderBufferLargetstTXN() is used only in non-streaming mode. Here is a test script for a many subtransactions scenario. In my environment, the logical decoding took over 2min to decode one top transaction having 100k subtransctions. - create table test (c int); create or replace function testfn (cnt int) returns void as $$ begin for i in 1..cnt loop begin insert into test values (i); exception when division_by_zero then raise notice 'caught error'; return; end; end loop; end; $$ language plpgsql; select testfn(10) set logical_decoding_work_mem to '4MB'; select count(*) from pg_logical_slot_peek_changes('s', null, null) To deal with this problem, I initially thought of the idea (a) mentioned in the comment; use a binary heap to maintain the transactions sorted by the amount of changes or the size. But it seems not a good idea to try maintaining all transactions by its size since the size of each transaction could be changed frequently. The attached patch uses a different approach that consists of three strategies; (1) maintain the list of transactions whose size is larger than 10% of logical_decoding_work_mem, and preferentially evict a transaction from this list. If the list is empty, all transactions are small enough, (2) so we evict the oldest top-level transaction from rb->toplevel_by_lsn list. Evicting older transactions would help in freeing memory blocks in GenerationContext. Finally, if this is also empty, (3) we evict a transaction that size is > 0. Here, we need to note the fact that even if a transaction is evicted the ReorderBufferTXN entry is not removed from rb->by_txn but its size is 0. In the worst case where all (quite a few) transactions are smaller than 10% of the memory limit, we might end up checking many transactions to find non-zero size transaction entries to evict. So the patch adds a new list to maintain all transactions that have at least one change in memory. Summarizing the algorithm I've implemented in the patch, 1. pick a transaction from the list of large transactions (larger than 10% of memory limit). 2. pick a transaction from the top-level transaction list in LSN order. 3. pick a transaction from the list of transactions that have at least one change in memory. With the patch, the above test case completed within 3 seconds in my environment. As a side note, the idea (c) mentioned in the comment, evicting multiple transactions at once to free a given portion of the memory, would also help in avoiding back and forth the memory threshold. It's also worth considering. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com improve_eviction_rb_poc.patch Description: Binary data
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Dec 11, 2023 at 1:18 PM Masahiko Sawada wrote: > I've attached the updated patch set. From the previous patch set, I've > merged patches 0007 to 0010. The other changes such as adding RT_GET() > still are unmerged for now, for discussion. Probably we can make them > as follow-up patches as we discussed. 0011 to 0015 patches are new > changes for v44 patch set, which removes RT_SEARCH() and RT_SET() and > support variable-length values. This looks like the right direction, and I'm pleased it's not much additional code on top of my last patch. v44-0014: +#ifdef RT_VARLEN_VALUE + /* XXX: need to choose block sizes? */ + tree->leaf_ctx = AllocSetContextCreate(ctx, +"radix tree leaves", +ALLOCSET_DEFAULT_SIZES); +#else + tree->leaf_ctx = SlabContextCreate(ctx, +"radix tree leaves", +RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)), +sizeof(RT_VALUE_TYPE)); +#endif /* RT_VARLEN_VALUE */ Choosing block size: Similar to what we've discussed previously around DSA segments, we might model this on CreateWorkExprContext() in src/backend/executor/execUtils.c. Maybe tid store can pass maint_w_m / autovac_w_m (later work_mem for bitmap scan). RT_CREATE could set the max block size to 1/16 of that, or less. Also, it occurred to me that compile-time embeddable values don't need a leaf context. I'm not sure how many places assume that there is always a leaf context. If not many, it may be worth not creating one here, just to be tidy. + size_t copysize; - memcpy(leaf.local, value_p, sizeof(RT_VALUE_TYPE)); + copysize = sizeof(RT_VALUE_TYPE); +#endif + + memcpy(leaf.local, value_p, copysize); I'm not sure this indirection adds clarity. I guess the intent was to keep from saying "memcpy" twice, but now the code has to say "copysize = foo" twice. For varlen case, we need to watch out for slowness because of memcpy. Let's put that off for later testing, though. We may someday want to avoid a memcpy call for the varlen case, so let's keep it flexible here. v44-0015: +#define SizeOfBlocktableEntry (offsetof( Unused. + char buf[MaxBlocktableEntrySize] = {0}; Zeroing this buffer is probably going to be expensive. Also see this pre-existing comment: /* WIP: slow, since it writes to memory for every bit */ page->words[wordnum] |= ((bitmapword) 1 << bitnum); For this function (which will be vacuum-only, so we can assume ordering), in the loop we can: * declare the local bitmapword variable to be zero * set the bits on it * write it out to the right location when done. Let's fix both of these at once. + if (TidStoreIsShared(ts)) + shared_rt_set(ts->tree.shared, blkno, (void *) page, page_len); + else + local_rt_set(ts->tree.local, blkno, (void *) page, page_len); Is there a reason for "void *"? The declared parameter is "RT_VALUE_TYPE *value_p" in 0014. Also, since this function is for vacuum (and other uses will need a new function), let's assert the returned bool is false. Does iteration still work? If so, it's not too early to re-wire this up with vacuum and see how it behaves. Lastly, my compiler has a warning that CI doesn't have: In file included from ../src/test/modules/test_radixtree/test_radixtree.c:121: ../src/include/lib/radixtree.h: In function ‘rt_find.isra’: ../src/include/lib/radixtree.h:2142:24: warning: ‘slot’ may be used uninitialized [-Wmaybe-uninitialized] 2142 | return (RT_VALUE_TYPE*) slot; |^ ../src/include/lib/radixtree.h:2112:23: note: ‘slot’ was declared here 2112 | RT_PTR_ALLOC *slot; | ^~~~
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Shubham, > > I am failing to apply the latest > Patch-"v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch" > for review. Please find the error I am facing: > D:\Project\Postgres>git am > D:\Project\Patch\v39-0001-postgres_fdw-add-postgres_fdw_verify_connection > -.patch > error: patch failed: contrib/postgres_fdw/connection.c:117 > error: contrib/postgres_fdw/connection.c: patch does not apply > hint: Use 'git am --show-current-patch=diff' to see the failed patch > Applying: postgres_fdw: add postgres_fdw_verify_connection variants > Patch failed at 0001 postgres_fdw: add postgres_fdw_verify_connection variants > When you have resolved this problem, run "git am --continue". > If you prefer to skip this patch, run "git am --skip" instead. > To restore the original branch and stop patching, run "git am --abort". Oh, good catch. Here is a new patch set. There are no new changes from v39. Best Regards, Hayato Kuroda FUJITSU LIMITED v40-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch Description: v40-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch v40-0002-add-test.patch Description: v40-0002-add-test.patch v40-0003-Extend-postgres_fdw_get_connections-to-return-us.patch Description: v40-0003-Extend-postgres_fdw_get_connections-to-return-us.patch
RE: Make COPY format extendable: Extract COPY TO format implementations
Dear Sutou-san, Junwang, Sorry for the delay reply. > > Can we discuss how to proceed this improvement? > > There are 2 approaches for it: > > 1. Do the followings concurrently: >a. Implementing small changes that got a consensus and > merge them step-by-step > (e.g. We got a consensus that we need to extract the > current format related routines.) >b. Discuss design > >(v1-v3 patches use this approach.) > > 2. Implement one (large) complete patch set with design >discussion and merge it > >(v4- patches use this approach.) > > Which approach is preferred? (Or should we choose another > approach?) > > I thought that 1. is preferred because it will reduce review > cost. So I chose 1. I'm ok to use approach 1, but could you please divide a large patch? E.g., 0001. defines an infrastructure for copy-API 0002. adjusts current codes to use APIs 0003. adds a test module in src/test/modules or contrib. ... This approach helps reviewers to see patches deeper. Separated patches can be combined when they are close to committable. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Make COPY format extendable: Extract COPY TO format implementations
On Mon, Dec 11, 2023 at 10:32 PM Masahiko Sawada wrote: > > On Mon, Dec 11, 2023 at 7:19 PM Michael Paquier wrote: > > > > On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > > > IIUC we cannot create two same name functions with the same arguments > > > but a different return value type in the first place. It seems to me > > > to be an overkill to change such a design. > > > > Agreed to not touch the logictics of LookupFuncName() for the sake of > > this thread. I have not checked the SQL specification, but I recall > > that there are a few assumptions from the spec embedded in the lookup > > logic particularly when it comes to specify a procedure name without > > arguments. > > > > > Another idea is to encapsulate copy_to/from_handler by a super class > > > like copy_handler. The handler function is called with an argument, > > > say copyto, and returns copy_handler encapsulating either > > > copy_to/from_handler depending on the argument. > > > > Yep, that's possible as well and can work as a cross-check between the > > argument and the NodeTag assigned to the handler structure returned by > > the function. > > > > At the end, the final result of the patch should IMO include: > > - Documentation about how one can register a custom copy_handler. > > - Something in src/test/modules/, minimalistic still useful that can > > be used as a template when one wants to implement their own handler. > > The documentation should mention about this module. > > - No need for SQL functions for all the in-core handlers: let's just > > return pointers to them based on the options given. > > Agreed. > > > It would be probably cleaner to split the patch so as the code is > > refactored and evaluated with the in-core handlers first, and then > > extended with the pluggable facilities and the function lookups. > > Agreed. > > I've sketched the above idea including a test module in > src/test/module/test_copy_format, based on v2 patch. It's not splitted > and is dirty so just for discussion. > The test_copy_format extension doesn't use the fields of CopyToState and CopyFromState in this patch, I think we should move CopyFromStateData and CopyToStateData to commands/copy.h, what do you think? The framework in the patch LGTM. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com -- Regards Junwang Zhao
Re: Add --check option to pgindent
> On 12 Dec 2023, at 01:09, Tristan Partin wrote: > > Not sold on the name, but --check is a combination of --silent-diff and > --show-diff. I envision --check mostly being used in CI environments. I > recently came across a situation where this behavior would have been useful. > Without --check, you're left to capture the output of --show-diff and exit 2 > if the output isn't empty by yourself. I haven’t studied the patch but I can see that being useful. ./daniel
Add --check option to pgindent
Not sold on the name, but --check is a combination of --silent-diff and --show-diff. I envision --check mostly being used in CI environments. I recently came across a situation where this behavior would have been useful. Without --check, you're left to capture the output of --show-diff and exit 2 if the output isn't empty by yourself. -- Tristan Partin Neon (https://neon.tech) From 1001252d49a47e660045cee3d2ba5abd87e925d9 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Mon, 11 Dec 2023 17:34:17 -0600 Subject: [PATCH v1] Add --check option to pgindent The option is a combination of --show-diff and --silent-diff. It is useful for situations such as CI checks where the diff can be logged, but will also cause the build to fail. Without such an option, people are left to re-implement the same logic over and over again. --- src/tools/pgindent/pgindent | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent index bce63d95da..a285015c76 100755 --- a/src/tools/pgindent/pgindent +++ b/src/tools/pgindent/pgindent @@ -23,7 +23,8 @@ my $devnull = File::Spec->devnull; my ($typedefs_file, $typedef_str, @excludes, $indent, $build, $show_diff, - $silent_diff, $help, @commits,); + $silent_diff, $help, @commits, + $check,); $help = 0; @@ -35,7 +36,8 @@ my %options = ( "excludes=s" => \@excludes, "indent=s" => \$indent, "show-diff" => \$show_diff, - "silent-diff" => \$silent_diff,); + "silent-diff" => \$silent_diff, + "check" => \$check,); GetOptions(%options) || usage("bad command line argument"); usage() if $help; @@ -43,6 +45,12 @@ usage() if $help; usage("Cannot have both --silent-diff and --show-diff") if $silent_diff && $show_diff; +usage("Cannot have both --check and --show-diff") + if $check && $show_diff; + +usage("Cannot have both --check and --silent-diff") + if $check && $silent_diff; + usage("Cannot use --commit with command line file list") if (@commits && @ARGV); @@ -325,6 +333,7 @@ Options: --indent=PATH path to pg_bsd_indent program --show-diff show the changes that would be made --silent-diff exit with status 2 if any changes would be made + --check combination of --show-diff and --silent-diff The --excludes and --commit options can be given more than once. EOF if ($help) @@ -417,7 +426,12 @@ foreach my $source_filename (@files) if ($source ne $orig_source) { - if ($silent_diff) + if ($check) + { + print show_diff($source, $source_filename); + exit 2; + } + elsif ($silent_diff) { exit 2; } -- Tristan Partin Neon (https://neon.tech)
Adding comments to help understand psql hidden queries
The use of the --echo-hidden flag in psql is used to show people the way psql performs its magic for its backslash commands. None of them has more magic than "\d relation", but it suffers from needing a lot of separate queries to gather all of the information it needs. Unfortunately, those queries can get overwhelming and hard to figure out which one does what, especially for those not already very familiar with the system catalogs. Attached is a patch to add a small SQL comment to the top of each SELECT query inside describeOneTableDetail. All other functions use a single query, and thus need no additional context. But "\d mytable" has the potential to run over a dozen SQL queries! The new format looks like this: / QUERY */ /* Get information about row-level policies */ SELECT pol.polname, pol.polpermissive, CASE WHEN pol.polroles = '{0}' THEN NULL ELSE pg_catalog.array_to_string(array(select rolname from pg_catalog.pg_roles where oid = any (pol.polroles) order by 1),',') END, pg_catalog.pg_get_expr(pol.polqual, pol.polrelid), pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid), CASE pol.polcmd WHEN 'r' THEN 'SELECT' WHEN 'a' THEN 'INSERT' WHEN 'w' THEN 'UPDATE' WHEN 'd' THEN 'DELETE' END AS cmd FROM pg_catalog.pg_policy pol WHERE pol.polrelid = '134384' ORDER BY 1; // Cheers, Greg psql.echo.hidden.comments.v1.patch Description: Binary data
Re: POC: Extension for adding distributed tracing - pg_tracing
Hi all Just saw this thread. I cooked up a PoC distributed tracing support in Pg years ago as part of the 2ndQuadrant BDR project. I used GUCs to set the `opentelemtery_trace_id` and `opentelemetry_span_id`. These can be sent as protocol-level messages by the client driver if the client driver has native trace integration enabled, in which case the user doesn't need to see or care. And for other drivers, the application can use `SET` or `SET LOCAL` to assign them. This approach avoids the need to rewrite SQL or give special meaning to SQL comments. Within the server IIRC I used the existing postgres resource owner and memory context stacks to maintain some internal nested traces. My implementation used the C++ opentelemetry client library to emit traces, so it was never going to be able to land in core. But IIRC the current BDR/PGD folks are now using an OpenTelemetry sidecar process instead. I think they send it UDP packets to emit metrics and events. Petr or Markus might be able to tell you more about how they're doing that. I'd love to see OpenTelemetry integration in Pg.
Re: unconstify()/unvolatize() vs g++/clang++
On Mon, Dec 11, 2023 at 11:32 PM Thomas Munro wrote: > On Mon, Dec 11, 2023 at 10:17 PM Peter Eisentraut > wrote: > > If you are slightly more daring, you can write an alternative definition > > in C++ using const_cast? > > Oh, yeah, right, that works for my case. I can't think of any > reasons not to do this, but IANAC++L. And pushed. Thanks!
Re: Sorting regression of text function result since commit 586b98fdf1aae
Jehan-Guillaume de Rorthais writes: > It looks like since 586b98fdf1aae, the result type collation of "convert_from" > is forced to "C", like the patch does for type "name", instead of the > "default" > collation for type "text". Well, convert_from() inherits its result collation from the input, per the normal rules for collation assignment [1]. > Looking at hints in the header comment of function "exprCollation", I poked > around and found that the result collation wrongly follow the input collation > in this case. It's not "wrong", it's what the SQL standard requires. > I couldn't find anything explaining this behavior in the changelog. It looks > like a regression to me, but if this is actually expected, maybe this deserve > some documentation patch? The v12 release notes do say Type name now behaves much like a domain over type text that has default collation “C”. You'd have similar results from an expression involving such a domain, I believe. I'm less than excited about patching the v12 release notes four years later. Maybe, if this point had come up in a more timely fashion, we'd have mentioned it --- but it's hardly possible to cover every potential implication of such a change in the release notes. regards, tom lane [1] https://www.postgresql.org/docs/current/collation.html#COLLATION-CONCEPTS
Sorting regression of text function result since commit 586b98fdf1aae
Hi, A customer found what looks like a sort regression while testing his code from v11 on a higher version. We hunt this regression down to commit 586b98fdf1aae, introduced in v12. Consider the following test case: createdb -l fr_FR.utf8 -T template0 reg psql reg <<<" BEGIN; CREATE TABLE IF NOT EXISTS reg ( id bigint NOT NULL, reg bytea NOT NULL ); INSERT INTO reg VALUES (1, convert_to( 'aaa', 'UTF8')), (2, convert_to( 'aa}', 'UTF8')); SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8');" In parent commit 68f6f2b7395fe, it results: id 2 1 And in 586b98fdf1aae: id 1 2 Looking at the plan, the sort node are different: * 68f6f2b7395fe: Sort Key: (convert_from(reg, 'UTF8'::name)) * 586b98fdf1aae: Sort Key: (convert_from(reg, 'UTF8'::name)) COLLATE "C" It looks like since 586b98fdf1aae, the result type collation of "convert_from" is forced to "C", like the patch does for type "name", instead of the "default" collation for type "text". Looking at hints in the header comment of function "exprCollation", I poked around and found that the result collation wrongly follow the input collation in this case. With 586b98fdf1aae: -- 2nd parameter type resolved as "name" so collation forced to "C" SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8'); -- 1 -- 2 -- Collation of 2nd parameter is forced to something else SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8' COLLATE \"default\"); -- 2 -- 1 -- Sort -- Sort Key: (convert_from(reg, 'UTF8'::name COLLATE "default")) -- -> Seq Scan on reg It seems because the second parameter type is "name", the result collation become "C" instead of being the collation associated with "text" type: "default". I couldn't find anything explaining this behavior in the changelog. It looks like a regression to me, but if this is actually expected, maybe this deserve some documentation patch? Regards,
Teach predtest about IS [NOT] proofs
Hello, I recently encountered a case where partial indexes were surprisingly not being used. The issue is that predtest doesn't understand how boolean values and IS expressions relate. For example if I have: create table foo(i int, bar boolean); create index on foo(i) where bar is true; then this query: select * from foo where i = 1 and bar; doesn't use the partial index. Attached is a patch that solves that issue. It also teaches predtest about quite a few more cases involving BooleanTest expressions (e.g., how they relate to NullTest expressions). One thing I could imagine being an objection is that not all of these warrant cycles in planning. If that turns out to be the case there's not a particularly clear line in my mind about where to draw that line. As noted in a TODO in the patch itself, I think it may be worth refactoring the test_predtest module to run the "x, y" case as well as the "y, x" case with a single call so as to eliminate a lot of repetition in clause/expression test cases. If reviewers agree that's desirable, then I could do that as a precursor. Regards, James Coleman v1-0001-Teach-predtest-about-IS-NOT-bool-proofs.patch Description: Binary data
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
On Mon, Dec 11, 2023 at 5:56 PM Alexander Korotkov wrote: > > BTW, do we really need to keep around the BTScanOpaqueData.firstPage > > field? Why can't the call to _bt_readpage from _bt_first (and from > > _bt_endpoint) just pass "firstPage=true" as a simple argument? Note > > that the first call to _bt_readpage must take place from _bt_first (or > > from _bt_endpoint). The first _bt_first call is already kind of > > special, in a way that is directly related to this issue. I added some > > comments about that to today's commit c9c0589fda, in fact -- I think > > it's an important issue in general. > > Please, check the attached patchset. Sorry, I forgot the attachment. Here it is. -- Regards, Alexander Korotkov 0001-Remove-BTScanOpaqueData.firstPage-v2.patch Description: Binary data 0002-Fix-requiredOppositeDir-bug-v2.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Mon, Dec 11, 2023 at 2:41 PM shveta malik wrote: > > > > > 5. > > +synchronize_slots(WalReceiverConn *wrconn) > > { > > ... > > ... > > + /* The syscache access needs a transaction env. */ > > + StartTransactionCommand(); > > + > > + /* > > + * Make result tuples live outside TopTransactionContext to make them > > + * accessible even after transaction is committed. > > + */ > > + MemoryContextSwitchTo(oldctx); > > + > > + /* Construct query to get slots info from the primary server */ > > + initStringInfo(&s); > > + construct_slot_query(&s); > > + > > + elog(DEBUG2, "slot sync worker's query:%s \n", s.data); > > + > > + /* Execute the query */ > > + res = walrcv_exec(wrconn, s.data, SLOTSYNC_COLUMN_COUNT, slotRow); > > + pfree(s.data); > > + > > + if (res->status != WALRCV_OK_TUPLES) > > + ereport(ERROR, > > + (errmsg("could not fetch failover logical slots info " > > + "from the primary server: %s", res->err))); > > + > > + CommitTransactionCommand(); > > ... > > ... > > } > > > > Where exactly in the above code, there is a syscache access as > > mentioned above StartTransactionCommand()? > > > > It is in walrcv_exec (libpqrcv_processTuples). I have changed the > comments to add this info. > Okay, I see that the patch switches context twice once after starting the transaction and the second time after committing the transaction, why is that required? Also, can't we extend the duration of the transaction till the remote_slot information is constructed? I am asking this because the context used is TopMemoryContext which should be used only if we need something specific to be retained at the process level which doesn't seem to be the case here. I have noticed a few other minor things: 1. postgres=# select * from pg_logical_slot_get_changes('log_slot_2', NULL, NULL); ERROR: cannot use replication slot "log_slot_2" for logical decoding DETAIL: This slot is being synced from the primary server. ... ... postgres=# select * from pg_drop_replication_slot('log_slot_2'); ERROR: cannot drop replication slot "log_slot_2" DETAIL: This slot is being synced from the primary. I think the DETAIL message should be the same in the above two cases. 2. +void +WalSndWaitForStandbyConfirmation(XLogRecPtr wait_for_lsn) +{ + List*standby_slots; + + Assert(!am_walsender); + + if (!MyReplicationSlot->data.failover) + return; + + standby_slots = GetStandbySlotList(true); + + ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv); ... ... Shouldn't we return if the standby slot names list is NIL unless there is a reason to do ConditionVariablePrepareToSleep() or any of the code following it? -- With Regards, Amit Kapila.
Re: Report planning memory in EXPLAIN ANALYZE
On Mon, Dec 11, 2023 at 2:06 PM jian he wrote: > > On Mon, Dec 4, 2023 at 3:24 PM Ashutosh Bapat > wrote: > > > > On Fri, Dec 1, 2023 at 8:27 AM Andrei Lepikhov > > wrote: > > > > > > On 30/11/2023 18:40, Alvaro Herrera wrote: > > > > Well, SUMMARY is enabled by default with ANALYZE, and I'd rather not > > > > have planner memory consumption displayed by default with all EXPLAIN > > > > ANALYZEs. So yeah, I still think this deserves its own option. > > > > > > > > But let's hear others' opinions on this point. I'm only one vote here. > > > > > > I agree; it should be disabled by default. The fact that memory > > > consumption outputs with byte precision (very uncomfortable for > > > perception) is a sign that the primary audience is developers and for > > > debugging purposes. > > > > That's 2 vs 1. Here's patch with MEMORY option added. Replying to > > Alvaro's earlier relevant comments. > > > > "Include information on planner's memory consumption. Specially, > include the total memory allocated by the planner and net memory that > remains used at the end of the planning. It defaults to > FALSE. > " > doc/src/sgml/ref/explain.sgml > I can view MemoryContextSizeDifference, figure out the meaning. > > But I am not sure "net memory that remains used at the end of the > planning" is the correct description. > (I am not native english speaker). The word "net" is used as an adjective, see [1] [1] https://www.merriam-webster.com/dictionary/net (as an adjective) Does that help? Do you have any other wording proposal? -- Best Wishes, Ashutosh Bapat
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Hi, Peter! On Sat, Dec 9, 2023 at 4:29 AM Peter Geoghegan wrote: > Does this really need to work at the scan key level, rather than at > the whole-scan level? Wouldn't it make more sense to just totally > disable it for the whole scan, since we'll barely ever need to do that > anyway? > > My ScalarArrayOpExpr patch will need to disable this optimization, > since with that patch in place we don't necessarily go through > _bt_first each time the search-type scan keys must change. We might > need to check a few tuples from before the _bt_first-wise position of > the next set of array values, which is a problem with > opposite-direction-only inequalities (it's a little bit like the > situation from my test case, actually). That's partly why I'd prefer > this to work at the whole-scan level (though I also just don't think > that inventing SK_BT_BT_FIRST makes much sense). > > I think that you should make it clearer that this whole optimization > only applies to required *inequalities*, which can be required in the > opposite direction *only*. It should be more obvious from looking at > the code that the optimization doesn't apply to required equality > strategy scan keys (those are always required in *both* scan > directions or in neither direction, so unlike the closely related > prefix optimization added by the same commit, they just can't use the > optimization that we need to fix here). > > BTW, do we really need to keep around the BTScanOpaqueData.firstPage > field? Why can't the call to _bt_readpage from _bt_first (and from > _bt_endpoint) just pass "firstPage=true" as a simple argument? Note > that the first call to _bt_readpage must take place from _bt_first (or > from _bt_endpoint). The first _bt_first call is already kind of > special, in a way that is directly related to this issue. I added some > comments about that to today's commit c9c0589fda, in fact -- I think > it's an important issue in general. Please, check the attached patchset. The first patch removes the BTScanOpaqueData.firstPage field as you proposed. I think this is a good idea, thank you for the proposal. Regarding the requiredOppositeDir bug. I don't want to lose the generality of the optimization. I could work for different cases, for example. WHERE col1 > val1 AND col1 < val2 WHERE col1 = val1 AND col2 > val2 AND col2 < val3 WHERE col1 = val1 AND col2 = val2 AND col3 > val3 AND col3 < val4 And there could be additional scan keys, which shouldn't be skipped. But that shouldn't mean we shouldn't skip others. See the second patch for my next proposal to fix the problem. Instead of relying on _bt_first(), let's rely on the first matched item on the page. Once it's found, we may skip scan keys required for the opposite direction. What do you think? -- Regards, Alexander Korotkov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Will you be in Prague this week? If not this might have to wait.
Re: brininsert optimization opportunity
On 12/11/23 16:00, Alexander Lakhin wrote: > Hello Tomas and Soumyadeep, > > 25.11.2023 23:06, Tomas Vondra wrote: >> I've done a bit more cleanup on the last version of the patch (renamed >> the fields to start with bis_ as agreed, rephrased the comments / docs / >> commit message a bit) and pushed. > > Please look at a query, which produces warnings similar to the ones > observed upthread: > CREATE TABLE t(a INT); > INSERT INTO t SELECT x FROM generate_series(1,1) x; > CREATE INDEX idx ON t USING brin (a); > REINDEX index CONCURRENTLY idx; > > WARNING: resource was not closed: [1863] (rel=base/16384/16389, > blockNum=1, flags=0x9380, refcount=1 1) > WARNING: resource was not closed: [1862] (rel=base/16384/16389, > blockNum=0, flags=0x9380, refcount=1 1) > > The first bad commit for this anomaly is c1ec02be1. > Thanks for the report. I haven't investigated what the issue is, but it seems we fail to release the buffers in some situations - I'd bet we fail to call the cleanup callback in some place, or something like that. I'll take a look tomorrow. > May be you would also want to fix in passing some typos/inconsistencies > introduced with recent brin-related commits: > s/bs_blkno/bt_blkno/ > s/emptry/empty/ > s/firt/first/ > s/indexinsertcleanup/aminsertcleanup/ > s/ maxRange/ nextRange/ > s/paga /page / > Definitely. Thanks for noticing those! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
On 12/11/23 07:12, Amit Kapila wrote: > On Sat, Dec 9, 2023 at 12:16 PM Amit Kapila wrote: >> >> Thanks, I could also reproduce the issue on back branches (tried till >> 12), and the fix works. I'll push this on Monday. >> > > Peter sent one minor suggestion (to write the check differently for > easier understanding) offlist which I addressed and pushed the patch. > Thanks for taking care of fixing this! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: brininsert optimization opportunity
Hello Tomas and Soumyadeep, 25.11.2023 23:06, Tomas Vondra wrote: I've done a bit more cleanup on the last version of the patch (renamed the fields to start with bis_ as agreed, rephrased the comments / docs / commit message a bit) and pushed. Please look at a query, which produces warnings similar to the ones observed upthread: CREATE TABLE t(a INT); INSERT INTO t SELECT x FROM generate_series(1,1) x; CREATE INDEX idx ON t USING brin (a); REINDEX index CONCURRENTLY idx; WARNING: resource was not closed: [1863] (rel=base/16384/16389, blockNum=1, flags=0x9380, refcount=1 1) WARNING: resource was not closed: [1862] (rel=base/16384/16389, blockNum=0, flags=0x9380, refcount=1 1) The first bad commit for this anomaly is c1ec02be1. May be you would also want to fix in passing some typos/inconsistencies introduced with recent brin-related commits: s/bs_blkno/bt_blkno/ s/emptry/empty/ s/firt/first/ s/indexinsertcleanup/aminsertcleanup/ s/ maxRange/ nextRange/ s/paga /page / Best regards, Alexander
Re: Make COPY format extendable: Extract COPY TO format implementations
On Mon, Dec 11, 2023 at 7:19 PM Michael Paquier wrote: > > On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > > IIUC we cannot create two same name functions with the same arguments > > but a different return value type in the first place. It seems to me > > to be an overkill to change such a design. > > Agreed to not touch the logictics of LookupFuncName() for the sake of > this thread. I have not checked the SQL specification, but I recall > that there are a few assumptions from the spec embedded in the lookup > logic particularly when it comes to specify a procedure name without > arguments. > > > Another idea is to encapsulate copy_to/from_handler by a super class > > like copy_handler. The handler function is called with an argument, > > say copyto, and returns copy_handler encapsulating either > > copy_to/from_handler depending on the argument. > > Yep, that's possible as well and can work as a cross-check between the > argument and the NodeTag assigned to the handler structure returned by > the function. > > At the end, the final result of the patch should IMO include: > - Documentation about how one can register a custom copy_handler. > - Something in src/test/modules/, minimalistic still useful that can > be used as a template when one wants to implement their own handler. > The documentation should mention about this module. > - No need for SQL functions for all the in-core handlers: let's just > return pointers to them based on the options given. Agreed. > It would be probably cleaner to split the patch so as the code is > refactored and evaluated with the in-core handlers first, and then > extended with the pluggable facilities and the function lookups. Agreed. I've sketched the above idea including a test module in src/test/module/test_copy_format, based on v2 patch. It's not splitted and is dirty so just for discussion. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com custom_copy_format_draft.patch Description: Binary data
Re: [CAUTION!! freemail] Re: Partial aggregates pushdown
On Thu, Dec 7, 2023 at 4:12 PM Bruce Momjian wrote: > Second, the patch already has a mechanism to check the remote server > version to see if it is the same or newer. Here is the version check > documentation patch: Right. This feature can certainly be implemented in a backward-compatible way. I'm not sure that we have as much control over what does and does not get pushed down as we really want here, but it's completely possible to do this in a way that doesn't break other use cases. > However, functions don't have pre-created records, and internal > functions don't see to have an SQL-defined structure, but as I remember > the internal aggregate functions all take the same internal structure, > so I guess we only need one fixed input and one output that would > output/input such records. Performance might be an issue, but at this > point let's just implement this and measure the overhead since there are > few/any(?) other viable options. IMHO records will be the easiest approach, but it will be some work to try it. > Fourth, going with #2 where we do the pushdown using an SQL keyword also > allows extensions to automatically work, while requiring partial > aggregate functions for every non-partial aggregate will require work > for extensions, and potentially lead to more version mismatch issues. Yeah. > Finally, I am now concerned that this will not be able to be in PG 17, > which I was hoping for. Getting it ready to ship by March seems very difficult. I'm not saying it couldn't happen, but I think more likely it won't. -- Robert Haas EDB: http://www.enterprisedb.com
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi! Thank you for your work. Your patch looks better! On 10.12.2023 13:32, jian he wrote: On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina wrote: Thank you for your work. Unfortunately, your code contained errors during the make installation: 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1 make[1]: *** [Makefile:131: parser/gram.h] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [src/Makefile.global:383: submake-generated-headers] Error 2 I have ubuntu 22.04 operation system. On 06.12.2023 13:47, jian he wrote: On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina wrote: Hi! Thank you for your contribution to this thread. I reviewed it and have a few questions. 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user. At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2. Sorry. I don't understand this part. Currently, if the error table name already exists, then the copy will fail, an error will be reported. I try to first create a table, if no error then the error table will be dropped. To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned. However, I didn't see if you dealt with cases where we already had a table with the same name as the table error. I mean, when is he trying to create for the first time, or will we never be able to face such a problem? Can you demo the expected behavior? Unfortunately, I was unable to launch it due to a build issue. Hopefully attached will work. Yes, thank you! It works fine, and I see that the regression tests have been passed. 🙂 However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below): postgres=# create table test (x int primary key, y int not null); postgres=# create table test1 (x int, z int, CONSTRAINT fk_x FOREIGN KEY(x) REFERENCES test(x)); I did not find a table with saved errors after operation, although I received a log about it: postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test_error ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (x)=(2) already exists. CONTEXT: COPY test, line 3 postgres=# select * from public.test_error; ERROR: relation "public.test_error" does not exist LINE 1: select * from public.test_error; postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test1_error ERROR: insert or update on table "test1" violates foreign key constraint "fk_x" DETAIL: Key (x)=(2) is not present in table "test". postgres=# select * from public.test1_error; ERROR: relation "public.test1_error" does not exist LINE 1: select * from public.test1_error; Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1. If I leave only the correct rows, everything works fine and the rows are added to the tables. in copy_test.csv: 2,0 1,1 in copy_test1.csv: 2,0 2,1 1,1 postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV COPY 2 postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error NOTICE: No error happened.Error holding table public.test1_error will be droped COPY 3 Maybe I'm launching it the wrong way. If so, let me know about it. I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation, I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation. I think you should add information about this in the documentation, since this seems to be normal behavior to me. postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT, FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT, ERR_DETAIL TEXT, ERRORCODE TEXT); CREATE TABLE postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error ERROR: Error save table public.test_error already exists. Cannot use it for COPY FROM error saving 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation. I think it would be better to use the name of the source file that was used while 'copy from' was r
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
On Fri, Dec 8, 2023 at 11:24 PM Alexander Korotkov wrote: > > Hi, Ashutosh! > > On Fri, Dec 8, 2023 at 3:28 PM Ashutosh Bapat > wrote: > > I did some analysis of memory consumption by bitmapsets in such cases. > > [1] contains slides with the result of this analysis. The slides are > > crude and quite WIP. But they will give some idea. > > > > [1] > > https://docs.google.com/presentation/d/1S9BiAADhX-Fv9tDbx5R5Izq4blAofhZMhHcO1c-wzfI/edit?usp=sharing > > Thank you for sharing your analysis. I understand that usage of a > plain bitmap becomes a problem with a large number of partitions. But > I wonder what does "post proposed fixes" mean? Is it the fixes posted > in [1]. If so it's very surprising for me they are reducing the > memory footprint size. No. These are fixes in various threads all listed together in [1]. I had started investigating memory consumption by Bitmapsets around the same time. The slides are result of that investigation. I have updated slides with this reference. [1] https://www.postgresql.org/message-id/caexhw5s_kwb0rb9l3turjxsvo5uctepdskkaemb5x1etssm...@mail.gmail.com They reduce the memory footprint by Bitmapset because they reduce the objects that contain the bitmapsets, thus reducing the total number of bitmapsets produced. -- Best Wishes, Ashutosh Bapat
Re: Memory consumption during partitionwise join planning
On Thu, Jul 27, 2023 at 7:28 PM Ashutosh Bapat wrote: > The memory consumption is broken by the objects that consume memory > during planning. The second attached patch is used to measure breakup > by functionality . Here's a brief explanation of the rows in the > table. > > 1. Restrictlist translations: Like other expressions the Restrictinfo > lists of parent are translated to obtain Restrictinfo lists to be > applied to child partitions (base as well as join). The first row > shows the memory consumed by the translated RestrictInfos. We can't > avoid these translations but closer examination reveals that a given > RestrictInfo gets translated multiple times proportional to the join > orders. These repeated translations can be avoided. I will start a > separate thread to discuss this topic. > > 2. Paths: this is the memory consumed when creating child join paths > and the Append paths in parent joins. It includes memory consumed by > the paths as well as translated expressions. I don't think we can > avoid creating these paths. But once the best paths are chosen for the > lower level relations, the unused paths can be freed. I will start a > separate thread to discuss this topic. > > 3. targetlist translation: child join relations' targetlists are > created by translating parent relations' targetlist. This row shows > the memory consumed by the translated targetlists. This translation > can't be avoided. > > 4. child SpecialJoinInfo: This is memory consumed in child joins' > SpecialJoinInfos translated from SpecialJoinInfo applicable to parent > joins. The child SpecialJoinInfos are translated on the fly when > computing child joins but are never freed. May be we can free them on > the fly as well or even better save them somewhere and fetch as and > when required. I will start a separate thread to discuss this topic. > > 5. Child join RelOptInfos: memory consumed by child join relations. > This is unavoidable as we need the RelOptInfos representing the child > joins. > > Table 3: Partitionwise join planning memory breakup > Num joins |2 |3 |4 |5 | > > 1. translated |1.8 MiB | 13.1 MiB | 58.0 MiB | 236.5 MiB | > restrictlists ||||| > > 2. creating child | 11.6 MiB | 59.4 MiB | 207.6 MiB | 768.2 MiB | > join paths ||||| > > 3. translated | 723.5 KiB |3.3 MiB | 10.6 MiB | 28.5 MiB | > targetlists||||| > > 4. child | 926.8 KiB |9.0 MiB | 45.7 MiB | 245.5 MiB | > SpecialJoinInfo||||| > > 5. Child join rels |1.6 MiB |7.9 MiB | 23.8 MiB | 67.5 MiB | > > > While subproblems and their solutions will be discussed in separate > email threads, this thread is to discuss I posted these patches long back but forgot to mention those in this thread. Listing them here at one place. [1] Memory reduction in SpecialJoinInfo - https://www.postgresql.org/message-id/flat/caexhw5thqef3asvqvffcghygpfpy7o3xnvhhwbgbjfmrh8k...@mail.gmail.com [2] Memory consumption reduction in RestrictInfos - https://www.postgresql.org/message-id/flat/CAExHW5s=bclmmq8n_bn6iu+pjau0ds3z_6dn6ile69esmsp...@mail.gmail.com [3] Memory consumption reduction in paths - https://www.postgresql.org/message-id/flat/CAExHW5tUcVsBkq9qT%3DL5vYz4e-cwQNw%3DKAGJrtSyzOp3F%3DXacA%40mail.gmail.com [4] Small change to reuse child bitmapsets in try_partitionwise_join() - https://www.postgresql.org/message-id/CAExHW5snUW7pD2RdtaBa1T_TqJYaY6W_YPVjWDrgSf33i-0uqA%40mail.gmail.com -- Best Wishes, Ashutosh Bapat
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
On Tue, Nov 28, 2023 at 5:40 PM Peter Eisentraut wrote: > On 23.11.23 15:13, Amul Sul wrote: > > The exact sequencing of this seems to be tricky. It's clear that we > > need to do it earlier than at the end. I also think it should be > > strictly after AT_PASS_ALTER_TYPE so that the new expression can > refer > > to the new type of a column. It should also be after > AT_PASS_ADD_COL, > > so that the new expression can refer to any newly added column. But > > then it's after AT_PASS_OLD_INDEX and AT_PASS_OLD_CONSTR, is that a > > problem? > > > > AT_PASS_ALTER_TYPE and AT_PASS_ADD_COL cannot be together, the ALTER TYPE > > cannot see that column, I think we can adopt the samebehaviour. > > With your v5 patch, I see the following behavior: > > create table t1 (a int, b int generated always as (a + 1) stored); > alter table t1 add column c int, alter column b set expression as (a + c); > ERROR: 42703: column "c" does not exist > > I think intuitively, this ought to work. Maybe just moving the new pass > after AT_PASS_ADD_COL would do it. > > I think we can't support that (like alter type) since we need to place this new pass before AT_PASS_OLD_INDEX & AT_PASS_OLD_CONSTR to re-add indexes and constraints for the validation. While looking at this, I figured that converting the AT_PASS_* macros to > an enum would make this code simpler and easier to follow. For patches > like yours you wouldn't have to renumber the whole list. We could put > this patch before yours if people agree with it. > Ok, 0001 patch does that. > > > I tried to reuse the code by borrowing code from ALTER TYPE, see if that > > looks good to you. > > > > But I have concerns, with that code reuse where we drop and re-add the > > indexes > > and constraints which seems unnecessary for SET EXPRESSION where column > > attributes will stay the same. I don't know why ATLER TYPE does that for > > index > > since finish_heap_swap() anyway does reindexing. We could skip re-adding > > index for SET EXPRESSION which would be fine but we could not skip the > > re-addition of constraints, since rebuilding constraints for checking > might > > need a good amount of code copy especially for foreign key constraints. > > > > Please have a look at the attached version, 0001 patch does the code > > refactoring, and 0002 is the implementation, using the newly refactored > > code to > > re-add indexes and constraints for the validation. Added tests for the > same. > > This looks reasonable after a first reading. (I have found that using > git diff --patience makes the 0001 patch look more readable. Maybe > that's helpful.) Yeah, the attached version is generated with a better git-diff algorithm for the readability. Regards, Amul v6-0002-Code-refactor-separate-function-to-find-all-depen.patch Description: Binary data v6-0001-Code-refactor-convert-macro-listing-to-enum.patch Description: Binary data v6-0003-Allow-to-change-generated-column-expression.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Mon, Dec 11, 2023 at 1:22 PM Drouvot, Bertrand wrote: > > Hi, > > On 12/8/23 10:06 AM, Amit Kapila wrote: > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik wrote: > >> > >> PFA v43, changes are: > >> > > > > I wanted to discuss 0003 patch about cascading standby's. It is not > > clear to me whether we want to allow physical standbys to further wait > > for cascading standby to sync their slots. If we allow such a feature > > one may expect even primary to wait for all the cascading standby's > > because otherwise still logical subscriber can be ahead of one of the > > cascading standby. > > I've the same feeling here. I think it would probably be expected that > the primary also wait for all the cascading standby. > > > I feel even if we want to allow such a behaviour we > > can do it later once the main feature is committed. > > Agree. > > > I think it would > > be good to just allow logical walsenders on primary to wait for > > physical standbys represented by GUC 'standby_slot_names'. > > That makes sense for me for v1. > > > If we agree > > on that then it would be good to prohibit setting this GUC on standby > > or at least it should be a no-op even if this GUC should be set on > > physical standby. > > I'd prefer to completely prohibit it on standby (to make it very clear it's > not > working at all) as long as one can enable it without downtime once the standby > is promoted (which is the case currently). And I think slot-sync worker should exit as well on cascading standby. Thoughts? If we agree on the above, then we need to look for a way to distinguish between first and cascading standby. I could not find any existing way to do so. One possible approach is to connect to the remote using PrimaryConninfo and run 'pg_is_in_recovery()' there, if it returns true, then it means we are cascading standby. Any simpler way to achieve this? thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Dec 11, 2023 at 2:21 PM shveta malik wrote: > > On Mon, Dec 11, 2023 at 1:47 PM Dilip Kumar wrote: > > > > On Fri, Dec 8, 2023 at 2:36 PM Amit Kapila wrote: > > > > > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik > > > wrote: > > > > > > > > PFA v43, changes are: > > > > > > > > > > I wanted to discuss 0003 patch about cascading standby's. It is not > > > clear to me whether we want to allow physical standbys to further wait > > > for cascading standby to sync their slots. If we allow such a feature > > > one may expect even primary to wait for all the cascading standby's > > > because otherwise still logical subscriber can be ahead of one of the > > > cascading standby. I feel even if we want to allow such a behaviour we > > > can do it later once the main feature is committed. I think it would > > > be good to just allow logical walsenders on primary to wait for > > > physical standbys represented by GUC 'standby_slot_names'. If we agree > > > on that then it would be good to prohibit setting this GUC on standby > > > or at least it should be a no-op even if this GUC should be set on > > > physical standby. > > > > > > Thoughts? > > > > IMHO, why not keep the behavior consistent across primary and standby? > > I mean if it doesn't require a lot of new code/design addition then > > it should be the user's responsibility. I mean if the user has set > > 'standby_slot_names' on standby then let standby also wait for > > cascading standby to sync their slots? Is there any issue with that > > behavior? > > > > Without waiting for cascading standby on primary, it won't be helpful > to just wait on standby. > > Currently logical walsenders on primary waits for physical standbys to > take changes before they update their own logical slots. But they wait > only for their immediate standbys and not for cascading standbys. > Although, on first standby, we do have logic where slot-sync workers > wait for cascading standbys before they update their own slots (synced > ones, see patch3). But this does not guarantee that logical > subscribers on primary will never be ahead of the cascading standbys. > Let us consider this timeline: > > t1: logical walsender on primary waiting for standby1 (first standby). > t2: physical walsender on standby1 is stuck and thus there is delay in > sending these changes to standby2 (cascading standby). > t3: standby1 has taken changes and sends confirmation to primary. > t4: logical walsender on primary receives confirmation from standby1 > and updates slot, logical subscribers of primary also receives the > changes. > t5: standby2 has not received changes yet as physical walsender on > standby1 is still stuck, slotsync worker still waiting for standby2 > (cascading) before it updates its own slots (synced ones). > t6: standby2 is promoted to become primary. > > Now we are in a state wherein primary, logical subscriber and first > standby has some changes but cascading standby does not. And logical > slots on primary were updated w/o confirming if cascading standby has > taken changes or not. This is a problem and we do not have a simple > solution for this yet. Okay, I think that makes sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is WAL_DEBUG related code still relevant today?
On Fri, Dec 08, 2023 at 01:45:18PM +0900, Michael Paquier wrote: > Looks acceptable to me. Does somebody object to this removal? Hearing nothing, done that. -- Michael signature.asc Description: PGP signature
Re: Make COPY format extendable: Extract COPY TO format implementations
On Sat, Dec 9, 2023 at 7:38 PM Hannu Krosing wrote: > > Hi Junwang > > Please also see my presentation slides from last years PostgreSQL > Conference in Berlin (attached) I read through the slides, really promising ideas, it's will be great if we can get there at last. > > The main Idea is to make not just "format", but also "transport" and > "stream processing" extendable via virtual function tables. The code is really coupled, it is not easy to do all of these in one round, it will be great if you have a POC patch. > > Btw, will any of you here be in Prague next week ? > Would be a good opportunity to discuss this in person. Sorry, no. > > > Best Regards > Hannu > > On Sat, Dec 9, 2023 at 9:39 AM Junwang Zhao wrote: > > > > On Sat, Dec 9, 2023 at 10:43 AM Hayato Kuroda (Fujitsu) > > wrote: > > > > > > Dear Junagn, Sutou-san, > > > > > > Basically I agree your point - improving a extendibility is good. > > > (I remember that this theme was talked at Japan PostgreSQL conference) > > > Below are my comments for your patch. > > > > > > 01. General > > > > > > Just to confirm - is it OK to partially implement APIs? E.g., only COPY > > > TO is > > > available. Currently it seems not to consider a case which is not > > > implemented. > > > > > For partially implements, we can leave the hook as NULL, and check the NULL > > at *ProcessCopyOptions* and report error if not supported. > > > > > 02. General > > > > > > It might be trivial, but could you please clarify how users can extend? > > > Is it OK > > > to do below steps? > > > > > > 1. Create a handler function, via CREATE FUNCTION, > > > 2. Register a handler, via new SQL (CREATE COPY HANDLER), > > > 3. Specify the added handler as COPY ... FORMAT clause. > > > > > My original thought was option 2, but as Michael point, option 1 is > > the right way > > to go. > > > > > 03. General > > > > > > Could you please add document-related tasks to your TODO? I imagined like > > > fdwhandler.sgml. > > > > > > 04. General - copyright > > > > > > For newly added files, the below copyright seems sufficient. See > > > applyparallelworker.c. > > > > > > ``` > > > * Copyright (c) 2023, PostgreSQL Global Development Group > > > ``` > > > > > > 05. src/include/catalog/* files > > > > > > IIUC, 8000 or higher OIDs should be used while developing a patch. > > > src/include/catalog/unused_oids > > > would suggest a candidate which you can use. > > > > Yeah, I will run renumber_oids.pl at last. > > > > > > > > 06. copy.c > > > > > > I felt that we can create files per copying methods, like > > > copy_{text|csv|binary}.c, > > > like indexes. > > > How do other think? > > > > Not sure about this, it seems others have put a lot of effort into > > splitting TO and From. > > Also like to hear from others. > > > > > > > > 07. fmt_to_name() > > > > > > I'm not sure the function is really needed. Can we follow like > > > get_foreign_data_wrapper_oid() > > > and remove the funciton? > > > > I have referenced some code from greenplum, will remove this. > > > > > > > > 08. GetCopyRoutineByName() > > > > > > Should we use syscache for searching a catalog? > > > > > > 09. CopyToFormatTextSendEndOfRow(), CopyToFormatBinaryStart() > > > > > > Comments still refer CopyHandlerOps, whereas it was renamed. > > > > > > 10. copy.h > > > > > > Per foreign.h and fdwapi.h, should we add a new header file and move some > > > APIs? > > > > > > 11. copy.h > > > > > > ``` > > > -/* These are private in commands/copy[from|to].c */ > > > -typedef struct CopyFromStateData *CopyFromState; > > > -typedef struct CopyToStateData *CopyToState; > > > ``` > > > > > > Are above changes really needed? > > > > > > 12. CopyFormatOptions > > > > > > Can we remove `bool binary` in future? > > > > > > 13. external functions > > > > > > ``` > > > +extern void CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc); > > > +extern void CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot > > > *slot); > > > +extern void CopyToFormatTextEnd(CopyToState cstate); > > > +extern void CopyFromFormatTextStart(CopyFromState cstate, TupleDesc > > > tupDesc); > > > +extern bool CopyFromFormatTextNext(CopyFromState cstate, ExprContext > > > *econtext, > > > + > > > Datum *values, bool *nulls); > > > +extern void CopyFromFormatTextErrorCallback(CopyFromState cstate); > > > + > > > +extern void CopyToFormatBinaryStart(CopyToState cstate, TupleDesc > > > tupDesc); > > > +extern void CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot > > > *slot); > > > +extern void CopyToFormatBinaryEnd(CopyToState cstate); > > > +extern void CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc > > > tupDesc); > > > +extern bool CopyFromFormatBinaryNext(CopyFromState cstate, > > > ExprContext *econtext, > > > + > > > Datum *values, bool *nulls); > > > +extern void CopyFromFormatBinaryErrorCallback(CopyFromState cstate); > > > ``` > > > > > > FYI - If you add files for {text|csv|binary}, these declarations can
Re: unconstify()/unvolatize() vs g++/clang++
On Mon, Dec 11, 2023 at 10:17 PM Peter Eisentraut wrote: > If you are slightly more daring, you can write an alternative definition > in C++ using const_cast? Oh, yeah, right, that works for my case. I can't think of any reasons not to do this, but IANAC++L. 0001-Define-unconstify-and-unvolatize-for-C.patch Description: Binary data
Re: Make COPY format extendable: Extract COPY TO format implementations
On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > IIUC we cannot create two same name functions with the same arguments > but a different return value type in the first place. It seems to me > to be an overkill to change such a design. Agreed to not touch the logictics of LookupFuncName() for the sake of this thread. I have not checked the SQL specification, but I recall that there are a few assumptions from the spec embedded in the lookup logic particularly when it comes to specify a procedure name without arguments. > Another idea is to encapsulate copy_to/from_handler by a super class > like copy_handler. The handler function is called with an argument, > say copyto, and returns copy_handler encapsulating either > copy_to/from_handler depending on the argument. Yep, that's possible as well and can work as a cross-check between the argument and the NodeTag assigned to the handler structure returned by the function. At the end, the final result of the patch should IMO include: - Documentation about how one can register a custom copy_handler. - Something in src/test/modules/, minimalistic still useful that can be used as a template when one wants to implement their own handler. The documentation should mention about this module. - No need for SQL functions for all the in-core handlers: let's just return pointers to them based on the options given. It would be probably cleaner to split the patch so as the code is refactored and evaluated with the in-core handlers first, and then extended with the pluggable facilities and the function lookups. -- Michael signature.asc Description: PGP signature
Re: GUC names in messages
On Mon, Dec 11, 2023 at 10:14:11AM +1100, Peter Smith wrote: > This v5* looks good to me, except it will need some further > modification if PeterE's suggestion [1] to keep quotes for the > MixedCase GUCs is adopted. -errdetail("The database cluster was initialized with CATALOG_VERSION_NO %d," - " but the server was compiled with CATALOG_VERSION_NO %d.", - ControlFile->catalog_version_no, CATALOG_VERSION_NO), +/*- translator: %s is a variable name and %d is its value */ +errdetail("The database cluster was initialized with %s %d," + " but the server was compiled with %s %d.", + "CATALOG_VERSION_NO", Good point. There are a lot of strings that can be shaved from the translations here. src/backend/access/transam/xlog.c: errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x)," src/backend/access/transam/xlog.c: errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d," src/backend/access/transam/xlog.c: errdetail("The database cluster was initialized without USE_FLOAT8_BYVAL" src/backend/access/transam/xlog.c: errdetail("The database cluster was initialized with USE_FLOAT8_BYVAL" I think that you should apply the same conversion for these ones. There is no gain with the 1st and 3rd ones, but the 2nd and 4th one can be grouped together. FWIW, if we don't convert MixedCase GUCs to become mixedcase, I don't think that there is any need to apply quotes to them because they don't really look like natural English words. That's as far as my opinion goes, so feel free to ignore me if the consensus is different. -- Michael signature.asc Description: PGP signature
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Mon, Dec 11, 2023 at 11:09:45AM +0530, Dilip Kumar wrote: > I haven't specifically done a review or testing of this patch, but I > have used this for testing the CLOG group update code with my > SLRU-specific changes and I found it quite helpful to test some of the > concurrent areas where you need to stop processing somewhere in the > middle of the code and testing that area without this kind of > injection point framework is really difficult or may not be even > possible. We wanted to test the case of clog group update where we > can get multiple processes added to a single group and get the xid > status updated by the group leader, you can refer to my test in that > thread[1] (the last patch test_group_commit.patch is using this > framework for testing). Could you be more specific? test_group_commit.patch includes this line but there is nothing specific about this injection point getting used in a test or a callback assigned to it: ./test_group_commit.patch:+ INJECTION_POINT("ClogGroupCommit"); > Overall I feel this framework is quite useful > and easy to use as well. Cool, thanks. -- Michael signature.asc Description: PGP signature
Re: Streaming I/O, vectored I/O (WIP)
On Mon, Dec 11, 2023 at 10:28 PM Heikki Linnakangas wrote: > On 11/12/2023 11:12, Thomas Munro wrote: > > 1. I eventually figured out how to generalise > > compute_remaining_iovec() (as I now call it) so that the existing > > pg_pwritev_with_retry() in file_utils.c could also use it, so that's > > now done in a patch of its own. > > In compute_remaining_iovec(): > > 'source' and 'destination' may point to the same array, in which > > case it is adjusted in-place; otherwise 'destination' must have enough > > space for 'iovcnt' elements. > Is there any use case for not adjusting it in place? > pg_pwritev_with_retry() takes a const iovec array, but maybe just remove > the 'const' and document that it scribbles on it? I guess I just wanted to preserve pg_pwritev_with_retry()'s existing prototype, primarily because it matches standard pwritev()/writev().
Re: Synchronizing slots from primary to standby
On Thu, Dec 7, 2023 at 1:33 PM Peter Smith wrote: > > Hi. > > Here are my review comments for patch v43-0002. > Thanks for the feedback. I have addressed most of these in v45. Please find my response on a few which are pending or are not needed. > == > Commit message > > 1. > The nap time of worker is tuned according to the activity on the primary. > The worker starts with nap time of 10ms and if no activity is observed on > the primary for some time, then nap time is increased to 10sec. And if > activity is observed again, nap time is reduced back to 10ms. > > ~ > /nap time of worker/nap time of the worker/ > /And if/If/ > > ~~~ > > 2. > Slots synced on the standby can be identified using 'sync_state' column of > pg_replication_slots view. The values are: > 'n': none for user slots, > 'i': sync initiated for the slot but waiting for the remote slot on the > primary server to catch up. > 'r': ready for periodic syncs. > > ~ > > /identified using/identified using the/ > > The meaning of "identified by" is unclear to me. It also seems to > clash with later descriptions in system-views.sgml. Please see my > later review comment about it (in the sgml file) > I have rephrased it, please check now and let me know. > == > doc/src/sgml/bgworker.sgml > > 3. > bgw_start_time is the server state during which postgres should start > the process; it can be one of BgWorkerStart_PostmasterStart (start as > soon as postgres itself has finished its own initialization; processes > requesting this are not eligible for database connections), > BgWorkerStart_ConsistentState (start as soon as a consistent state has > been reached in a hot standby, allowing processes to connect to > databases and run read-only queries), and > BgWorkerStart_RecoveryFinished (start as soon as the system has > entered normal read-write state. Note that the > BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are > equivalent in a server that's not a hot standby), and > BgWorkerStart_ConsistentState_HotStandby (same meaning as > BgWorkerStart_ConsistentState but it is more strict in terms of the > server i.e. start the worker only if it is hot-standby; if it is > consistent state in non-standby, worker will not be started). Note > that this setting only indicates when the processes are to be started; > they do not stop when a different state is reached. > > ~ > > 3a. > This seems to have grown to become just one enormous sentence that is > too hard to read. IMO this should be changed to be a of > possible values instead of a big slab of text. I suspect it could also > be simplified quite a lot -- something like below > > SUGGESTION > bgw_start_time is the server state during which postgres should start > the process. Note that this setting only indicates when the processes > are to be started; they do not stop when a different state is reached. > Possible values are: > > - BgWorkerStart_PostmasterStart (start as soon as postgres itself has > finished its own initialization; processes requesting this are not > eligible for database connections) > > - BgWorkerStart_ConsistentState (start as soon as a consistent state > has been reached in a hot-standby, allowing processes to connect to > databases and run read-only queries) > > - BgWorkerStart_RecoveryFinished (start as soon as the system has > entered normal read-write state. Note that the > BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are > equivalent in a server that's not a hot standby) > > - BgWorkerStart_ConsistentState_HotStandby (same meaning as > BgWorkerStart_ConsistentState but it is more strict in terms of the > server i.e. start the worker only if it is hot-standby; if it is a > consistent state in non-standby, the worker will not be started). > > ~~~ > > 3b. > "i.e. start the worker only if it is hot-standby; if it is consistent > state in non-standby, worker will not be started" > > ~ > > Why is it even necessary to say the 2nd part "if it is consistent > state in non-standby, worker will not be started". It seems redundant > given 1st part says the same, right? > > > == > doc/src/sgml/config.sgml > > 4. > + > +The standbys corresponding to the physical replication slots in > +standby_slot_names must enable > +enable_syncslot for the standbys to receive > +failover logical slots changes from the primary. > + > > 4a. > Somehow "must enable enable_syncslot" seemed strange. Maybe re-word like: > > "must enable slot synchronization (see enable_syncslot)" > > OR > > "must configure enable_syncslot = true" > > ~~~ > > 4b. > (seems like repetitive use of "the standbys") > > /for the standbys to/to/ > > OR > > /for the standbys to/so they can/ > > ~~~ > > 5. >primary_conninfo string, or in a separate > - ~/.pgpass file on the standby server (use > + ~/.pgpass file on the standby server. (use > > This rearranged period seems unrelated to the current patch. Maybe >
Re: backtrace_on_internal_error
On 08.12.23 19:14, Andres Freund wrote: FWIW, I did some analysis on aggregated logs on a larger number of machines, and it does look like that'd be a measurable increase in log volume. There are a few voluminous internal errors in core, but the bigger issue is extensions. They are typically much less disciplined about assigning error codes than core PG is. Good point. Also, during development, I often just put elog(ERROR, "real error later").
Re: Streaming I/O, vectored I/O (WIP)
On 11/12/2023 11:12, Thomas Munro wrote: 1. I eventually figured out how to generalise compute_remaining_iovec() (as I now call it) so that the existing pg_pwritev_with_retry() in file_utils.c could also use it, so that's now done in a patch of its own. In compute_remaining_iovec(): 'source' and 'destination' may point to the same array, in which case it is adjusted in-place; otherwise 'destination' must have enough space for 'iovcnt' elements. Is there any use case for not adjusting it in place? pg_pwritev_with_retry() takes a const iovec array, but maybe just remove the 'const' and document that it scribbles on it? I'm planning to commit these fairly soon. +1 -- Heikki Linnakangas Neon (https://neon.tech)
Re: unconstify()/unvolatize() vs g++/clang++
On 11.12.23 01:42, Thomas Munro wrote: AFAICS you can't use unconstify()/unvolatize() in a static inline function in a .h file, or in a .cpp file, because __builtin_types_compatible_p is only available in C, not C++. Seems like a reasonable thing to want to be able to do, no? I'm not immediately sure what the right fix is; would #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) && !defined(__cplusplus) around the relevant versions of constify()/unvolatize() be too easy? That seems right to me. If you are slightly more daring, you can write an alternative definition in C++ using const_cast?
Re: Streaming I/O, vectored I/O (WIP)
On Sat, Dec 9, 2023 at 10:23 PM Heikki Linnakangas wrote: > Ok, works for me. I finished up making a few more improvements: 1. I eventually figured out how to generalise compute_remaining_iovec() (as I now call it) so that the existing pg_pwritev_with_retry() in file_utils.c could also use it, so that's now done in a patch of its own. 2. FileReadV/FileWriteV patch: * further simplification of the traditional ENOSPC 'guess' * unconstify() changed to raw cast (pending [1]) * fixed the DO_DB()-wrapped debugging code 3. smgrreadv/smgrwritev patch: * improved ENOSPC handling * improve description of EOF and ENOSPC handling * fixed the sizes reported in dtrace static probes * fixed some words in the docs about that * changed error messages to refer to "blocks %u..%u" 4. smgrprefetch-with-nblocks patch has no change, hasn't drawn any comments hopefully because it is uncontroversial. I'm planning to commit these fairly soon. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com v4-0001-Provide-helper-routine-for-partial-vector-I-O-ret.patch Description: Binary data v4-0002-Provide-vectored-variants-of-FileRead-and-FileWri.patch Description: Binary data v4-0003-Provide-vectored-variants-of-smgrread-and-smgrwri.patch Description: Binary data v4-0004-Provide-multi-block-smgrprefetch.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Sun, Dec 10, 2023 at 4:33 PM Amit Kapila wrote: > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik wrote: > > > > v43-002: > > > > Review comments on v43-0002: > = Thanks for the feedback Amit. Addressed these in v45. Please find my response on a few of these. > 1. > synchronize_one_slot() > { > ... > + /* > + * With hot_standby_feedback enabled and invalidations handled > + * apropriately as above, this should never happen. > + */ > + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn) > + { > + ereport(ERROR, > + errmsg("not synchronizing local slot \"%s\" LSN(%X/%X)" > +" to remote slot's LSN(%X/%X) as synchronization " > +" would move it backwards", remote_slot->name, > +LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn), > +LSN_FORMAT_ARGS(remote_slot->restart_lsn))); > + > + goto cleanup; > ... > } > > After the error, the control won't return, so the above goto doesn't > make any sense. > > 2. > synchronize_one_slot() > { > ... > + /* Search for the named slot */ > + if ((s = SearchNamedReplicationSlot(remote_slot->name, true))) > + { > + SpinLockAcquire(&s->mutex); > + sync_state = s->data.sync_state; > + SpinLockRelease(&s->mutex); > + } > ... > ... > + ReplicationSlotAcquire(remote_slot->name, true); > + > + /* > + * Copy the invalidation cause from remote only if local slot is not > + * invalidated locally, we don't want to overwrite existing one. > + */ > + if (MyReplicationSlot->data.invalidated == RS_INVAL_NONE) > + { > + SpinLockAcquire(&MyReplicationSlot->mutex); > + MyReplicationSlot->data.invalidated = remote_slot->invalidated; > + SpinLockRelease(&MyReplicationSlot->mutex); > + } > + > + /* Skip the sync if slot has been invalidated locally. */ > + if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) > + goto cleanup; > ... > > It seems useless to acquire the slot if it is locally invalidated in > the first place. Won't it be better if after the search we first check > whether the slot is locally invalidated and take appropriate action? > If we don't acquire the slot first, there could be a race condition that the local slot could be invalidated just after checking the invalidated flag. See InvalidatePossiblyObsoleteSlot() where it invalidates slot directly if the slot is not acquired by other processes. Thus, I have not removed 'ReplicationSlotAcquire' but I have re-structured the code a little bit to get rid of duplicate code in 'if' and 'else' part for invalidation logic. > 3. After doing the above two, I think it doesn't make sense to have > goto at the remaining places in synchronize_one_slot(). We can simply > release the slot and commit the transaction at other places. > > 4. > + * Returns nap time for the next sync-cycle. > + */ > +static long > +synchronize_slots(WalReceiverConn *wrconn) > > Returning nap time from here appears a bit awkward. I think it is > better if this function returns any_slot_updated and then the caller > decides the adjustment of naptime. > > 5. > +synchronize_slots(WalReceiverConn *wrconn) > { > ... > ... > + /* The syscache access needs a transaction env. */ > + StartTransactionCommand(); > + > + /* > + * Make result tuples live outside TopTransactionContext to make them > + * accessible even after transaction is committed. > + */ > + MemoryContextSwitchTo(oldctx); > + > + /* Construct query to get slots info from the primary server */ > + initStringInfo(&s); > + construct_slot_query(&s); > + > + elog(DEBUG2, "slot sync worker's query:%s \n", s.data); > + > + /* Execute the query */ > + res = walrcv_exec(wrconn, s.data, SLOTSYNC_COLUMN_COUNT, slotRow); > + pfree(s.data); > + > + if (res->status != WALRCV_OK_TUPLES) > + ereport(ERROR, > + (errmsg("could not fetch failover logical slots info " > + "from the primary server: %s", res->err))); > + > + CommitTransactionCommand(); > ... > ... > } > > Where exactly in the above code, there is a syscache access as > mentioned above StartTransactionCommand()? > It is in walrcv_exec (libpqrcv_processTuples). I have changed the comments to add this info. > 6. > - ~/.pgpass file on the standby server (use > + ~/.pgpass file on the standby server. (use >replication as the database name). > > Why do we need this change? We don't, removed it. > > 7. > + standby. Additionally, similar to creating a logical replication slot > + on the hot standby, hot_standby_feedback should be > + set on the standby and a physical slot between the primary and the > standby > + should be used. > > In this, I don't understand the relation between the first part of the > line: "Additionally, similar to creating a logical replication slot on > the hot standby ..." with the rest. > > 8. > However, > + the slots which were in initiated sync_state ('i) and were not > > A single quote after 'i' is missing. > > 9. > the slots with state 'r' and 'i' can neither be used for logical > + decoded nor dropped by the user. >
Re: Synchronizing slots from primary to standby
On Mon, Dec 11, 2023 at 1:47 PM Dilip Kumar wrote: > > On Fri, Dec 8, 2023 at 2:36 PM Amit Kapila wrote: > > > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik wrote: > > > > > > PFA v43, changes are: > > > > > > > I wanted to discuss 0003 patch about cascading standby's. It is not > > clear to me whether we want to allow physical standbys to further wait > > for cascading standby to sync their slots. If we allow such a feature > > one may expect even primary to wait for all the cascading standby's > > because otherwise still logical subscriber can be ahead of one of the > > cascading standby. I feel even if we want to allow such a behaviour we > > can do it later once the main feature is committed. I think it would > > be good to just allow logical walsenders on primary to wait for > > physical standbys represented by GUC 'standby_slot_names'. If we agree > > on that then it would be good to prohibit setting this GUC on standby > > or at least it should be a no-op even if this GUC should be set on > > physical standby. > > > > Thoughts? > > IMHO, why not keep the behavior consistent across primary and standby? > I mean if it doesn't require a lot of new code/design addition then > it should be the user's responsibility. I mean if the user has set > 'standby_slot_names' on standby then let standby also wait for > cascading standby to sync their slots? Is there any issue with that > behavior? > Without waiting for cascading standby on primary, it won't be helpful to just wait on standby. Currently logical walsenders on primary waits for physical standbys to take changes before they update their own logical slots. But they wait only for their immediate standbys and not for cascading standbys. Although, on first standby, we do have logic where slot-sync workers wait for cascading standbys before they update their own slots (synced ones, see patch3). But this does not guarantee that logical subscribers on primary will never be ahead of the cascading standbys. Let us consider this timeline: t1: logical walsender on primary waiting for standby1 (first standby). t2: physical walsender on standby1 is stuck and thus there is delay in sending these changes to standby2 (cascading standby). t3: standby1 has taken changes and sends confirmation to primary. t4: logical walsender on primary receives confirmation from standby1 and updates slot, logical subscribers of primary also receives the changes. t5: standby2 has not received changes yet as physical walsender on standby1 is still stuck, slotsync worker still waiting for standby2 (cascading) before it updates its own slots (synced ones). t6: standby2 is promoted to become primary. Now we are in a state wherein primary, logical subscriber and first standby has some changes but cascading standby does not. And logical slots on primary were updated w/o confirming if cascading standby has taken changes or not. This is a problem and we do not have a simple solution for this yet. thanks Shveta
Re: [Proposal] Add foreign-server health checks infrastructure
On Mon, Dec 11, 2023 at 2:08 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Fujii-san, Tom, > > Thank you for giving a suggestion! PSA new version. > > > Regarding 0001 patch, on second thought, to me, it seems odd to expose > > a function that doesn't have anything to directly do with PostgreSQL > > as a libpq function. The function simply calls poll() on the socket > > with POLLRDHUP if it is supported. While it is certainly convenient to > > have this function, I'm not sure that it fits within the scope of libpq. > > Thought? > > Current style is motivated by Onder [1] and later discussions. I thought it > might > be useful for other developers, but OK, I can remove changes on libpq modules. > Horiguchi-san has suggested [2] that it might be overkill to use > WaitEventSet() > mechanism, so I kept using poll(). > I reused the same naming as previous version because they actually do > something > Like libpq, but better naming is very welcome. > > > Regarding 0002 patch, the behavior of > > postgres_fdw_verify_connection_states() > > in regards to multiple connections using different user mappings seems > > not well documented. The function seems to return false if either of > > those connections has been closed. > > I did not considered the situation because I have not came up with the > situation > that only one of connections to the same foreign server is broken. > > > This behavior means that it's difficult to identify which connection > > has been closed when there are multiple ones to the given server. > > To make it easier to identify that, it could be helpful to extend > > the postgres_fdw_verify_connection_states() function so that it accepts > > a unique connection as an input instead of just the server name. > > One suggestion is to extend the function so that it accepts > > both the server name and the user name used for the connection, > > and checks the specified connection. If only the server name is specified, > > the function should check all connections to the server and return false > > if any of them are closed. This would be helpful since there is typically > > only one connection to the server in most cases. > > Just to confirm, your point "user name" means a local user, right? > I made a patch for addressing them. > > > Additionally, it would be helpful to extend the > > postgres_fdw_get_connections() > > function to also return the user name used for each connection, > > as currently there is no straightforward way to identify it. > > Added, See 0003. IIUC there is no good way to extract user mapping from its > OID, so I have > added an function to do that and used it. > > > The function name "postgres_fdw_verify_connection_states" may seem > > unnecessarily long to me. A simpler name like > > "postgres_fdw_verify_connection" may be enough? > > Renamed. > > > The patch may not be ready for commit due to the review comments, > > and with the feature freeze approaching in a few days, > > it may not be possible to include this feature in v16. > > It is sad for me, but it is more important for PostgreSQL to add nicer codes. > I changed status to "Needs review" again. I am failing to apply the latest Patch-"v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch" for review. Please find the error I am facing: D:\Project\Postgres>git am D:\Project\Patch\v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch error: patch failed: contrib/postgres_fdw/connection.c:117 error: contrib/postgres_fdw/connection.c: patch does not apply hint: Use 'git am --show-current-patch=diff' to see the failed patch Applying: postgres_fdw: add postgres_fdw_verify_connection variants Patch failed at 0001 postgres_fdw: add postgres_fdw_verify_connection variants When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Please rebase and post an updated version of the Patch. Thanks and Regards, Shubham Khanna.
Re: Report planning memory in EXPLAIN ANALYZE
On Mon, Dec 4, 2023 at 3:24 PM Ashutosh Bapat wrote: > > On Fri, Dec 1, 2023 at 8:27 AM Andrei Lepikhov > wrote: > > > > On 30/11/2023 18:40, Alvaro Herrera wrote: > > > Well, SUMMARY is enabled by default with ANALYZE, and I'd rather not > > > have planner memory consumption displayed by default with all EXPLAIN > > > ANALYZEs. So yeah, I still think this deserves its own option. > > > > > > But let's hear others' opinions on this point. I'm only one vote here. > > > > I agree; it should be disabled by default. The fact that memory > > consumption outputs with byte precision (very uncomfortable for > > perception) is a sign that the primary audience is developers and for > > debugging purposes. > > That's 2 vs 1. Here's patch with MEMORY option added. Replying to > Alvaro's earlier relevant comments. > "Include information on planner's memory consumption. Specially, include the total memory allocated by the planner and net memory that remains used at the end of the planning. It defaults to FALSE. " doc/src/sgml/ref/explain.sgml I can view MemoryContextSizeDifference, figure out the meaning. But I am not sure "net memory that remains used at the end of the planning" is the correct description. (I am not native english speaker).
Re: Synchronizing slots from primary to standby
On Mon, Dec 11, 2023 at 1:02 PM Peter Smith wrote: > > Here are some review comments for v44-0001 > > ~~~ > > 3. assign_standby_slot_names > > + if (!SplitIdentifierString(standby_slot_names_cpy, ',', &standby_slots)) > + { > + /* This should not happen if GUC checked check_standby_slot_names. */ > + elog(ERROR, "list syntax is invalid"); > + } > > This error here and in validate_standby_slots() are different -- > "list" versus "List". > Note here elog(ERROR,.. is used and in the other place it is part of the detail message. I have suggested in my previous review to make them the same but I overlooked the difference, so I think we should change the message to "invalid list syntax" as it was there previously. > == > src/backend/replication/walsender.c > > > 4. WalSndFilterStandbySlots > > > + foreach(lc, standby_slots_cpy) > + { > + char*name = lfirst(lc); > + XLogRecPtr restart_lsn = InvalidXLogRecPtr; > + bool invalidated = false; > + char*warningfmt = NULL; > + ReplicationSlot *slot; > + > + slot = SearchNamedReplicationSlot(name, true); > + > + if (slot && SlotIsPhysical(slot)) > + { > + SpinLockAcquire(&slot->mutex); > + restart_lsn = slot->data.restart_lsn; > + invalidated = slot->data.invalidated != RS_INVAL_NONE; > + SpinLockRelease(&slot->mutex); > + } > + > + /* Continue if the current slot hasn't caught up. */ > + if (!invalidated && !XLogRecPtrIsInvalid(restart_lsn) && > + restart_lsn < wait_for_lsn) > + { > + /* Log warning if no active_pid for this physical slot */ > + if (slot->active_pid == 0) > + ereport(WARNING, > + errmsg("replication slot \"%s\" specified in parameter \"%s\" does > not have active_pid", > +name, "standby_slot_names"), > + errdetail("Logical replication is waiting on the " > + "standby associated with \"%s\"", name), > + errhint("Consider starting standby associated with " > + "\"%s\" or amend standby_slot_names", name)); > + > + continue; > + } > + else if (!slot) > + { > + /* > + * It may happen that the slot specified in standby_slot_names GUC > + * value is dropped, so let's skip over it. > + */ > + warningfmt = _("replication slot \"%s\" specified in parameter > \"%s\" does not exist, ignoring"); > + } > + else if (SlotIsLogical(slot)) > + { > + /* > + * If a logical slot name is provided in standby_slot_names, issue > + * a WARNING and skip it. Although logical slots are disallowed in > + * the GUC check_hook(validate_standby_slots), it is still > + * possible for a user to drop an existing physical slot and > + * recreate a logical slot with the same name. Since it is > + * harmless, a WARNING should be enough, no need to error-out. > + */ > + warningfmt = _("cannot have logical replication slot \"%s\" in > parameter \"%s\", ignoring"); > + } > + else if (XLogRecPtrIsInvalid(restart_lsn) || invalidated) > + { > + /* > + * Specified physical slot may have been invalidated, so there is no point > + * in waiting for it. > + */ > + warningfmt = _("physical slot \"%s\" specified in parameter \"%s\" > has been invalidated, ignoring"); > + } > + else > + { > + Assert(restart_lsn >= wait_for_lsn); > + } > > This if/else chain seems structured awkwardly. IMO it would be tidier > to eliminate the NULL slot and IsLogicalSlot up-front, which would > also simplify some of the subsequent conditions > > SUGGESTION > > slot = SearchNamedReplicationSlot(name, true); > > if (!slot) > { > ... > } > else if (SlotIsLogical(slot)) > { > ... > } > else > { > Assert(SlotIsPhysical(slot)) > > SpinLockAcquire(&slot->mutex); > restart_lsn = slot->data.restart_lsn; > invalidated = slot->data.invalidated != RS_INVAL_NONE; > SpinLockRelease(&slot->mutex); > > if (XLogRecPtrIsInvalid(restart_lsn) || invalidated) > { > ... > } > else if (!invalidated && !XLogRecPtrIsInvalid(restart_lsn) && > restart_lsn < wait_for_lsn) > { > ... > } > else > { > Assert(restart_lsn >= wait_for_lsn); > } > } > +1. -- With Regards, Amit Kapila.
Re: SQL:2011 application time
hi. some small issues diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index e3ccf6c7f7..6781e55020 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -1560,7 +1560,7 @@ ProcessUtilitySlow(ParseState *pstate, true, /* check_rights */ true, /* check_not_in_use */ false, /* skip_build */ - false); /* quiet */ + false); /* quiet */ Is the above part unnecessary? diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 199eae525d..d04c75b398 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -78,6 +78,7 @@ OBJS = \ oracle_compat.o \ orderedsetaggs.o \ partitionfuncs.o \ + period.o \ pg_locale.o \ pg_lsn.o \ pg_upgrade_support.o \ diff --git a/src/backend/utils/adt/period.c b/src/backend/utils/adt/period.c new file mode 100644 index 00..0ed4304e16 --- /dev/null +++ b/src/backend/utils/adt/period.c @@ -0,0 +1,56 @@ +/*- + * + * period.c + * Functions to support periods. + * + * + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/period.c + * + *- + */ +#include "postgres.h" + +#include "executor/tuptable.h" +#include "fmgr.h" +#include "nodes/primnodes.h" +#include "utils/fmgrprotos.h" +#include "utils/period.h" +#include "utils/rangetypes.h" + +Datum period_to_range(TupleTableSlot *slot, int startattno, int endattno, Oid rangetype) +{ + Datum startvalue; + Datum endvalue; + Datum result; + bool startisnull; + bool endisnull; + LOCAL_FCINFO(fcinfo, 2); + FmgrInfo flinfo; + FuncExpr *f; + + InitFunctionCallInfoData(*fcinfo, &flinfo, 2, InvalidOid, NULL, NULL); + f = makeNode(FuncExpr); + f->funcresulttype = rangetype; + flinfo.fn_expr = (Node *) f; + flinfo.fn_extra = NULL; + + /* compute oldvalue */ + startvalue = slot_getattr(slot, startattno, &startisnull); + endvalue = slot_getattr(slot, endattno, &endisnull); + + fcinfo->args[0].value = startvalue; + fcinfo->args[0].isnull = startisnull; + fcinfo->args[1].value = endvalue; + fcinfo->args[1].isnull = endisnull; + + result = range_constructor2(fcinfo); + if (fcinfo->isnull) + elog(ERROR, "function %u returned NULL", flinfo.fn_oid); + + return result; +} I am confused. so now I only apply v19, 0001 to 0003. period_to_range function never used. maybe we can move this part to 0005-Add PERIODs.patch? Also you add change in Makefile in 0003, meson.build change in 0005, better put it on in 0005? diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 5b110ca7fe..d54d84adf6 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y +/* + * We need to handle this shift/reduce conflict: + * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo. + * This is basically the classic "dangling else" problem, and we want a + * similar resolution: treat the TO as part of the INTERVAL, not as part of + * the FROM ... TO Users can add parentheses if that's a problem. + * TO just needs to be higher precedence than YEAR_P etc. + * TODO: I need to figure out a %prec solution before this gets committed! + */ +%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P +%nonassoc TO this part will never happen? since "FROM INTERVAL YEAR TO MONTH TO" means "valid_at" will be interval range data type, which does not exist now. ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, NULL, + targetRangeParam, targetRange, true, /* must detect new rows */ SPI_OK_SELECT); @@ -905,6 +922,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS) ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, NULL, + -1, 0, true, /* must detect new rows */ SPI_OK_DELETE); @@ -1026,6 +1044,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS) ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, newslot, + -1, 0, true, /* must detect new rows */ SPI_OK_UPDATE); @@ -1258,6 +1277,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind) ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, NULL, + -1, 0, true, /* must detect new rows */ SPI_OK_UPDATE); @@ -2520,6 +2540,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo, RI_QueryKey *qkey, SPIPlanPtr qplan, Relation fk_rel, Relation pk_rel, TupleTableSlot *oldslot, TupleTableSlot *newslot, + int forPortionOfParam, Datum forPortionOf, bool detectNewRows, int expect_OK) for all the refactor related to ri_PerformCheck, do you need (Datum) 0 instead of plain 0? + + If the table has a range column or + PERIOD, + you may supply a FOR PORTION OF clause, and your delete will + only affect rows that overlap the given interval. Furthermore, if a row's span https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:tex
Re: Synchronizing slots from primary to standby
On Fri, Dec 8, 2023 at 2:36 PM Amit Kapila wrote: > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik wrote: > > > > PFA v43, changes are: > > > > I wanted to discuss 0003 patch about cascading standby's. It is not > clear to me whether we want to allow physical standbys to further wait > for cascading standby to sync their slots. If we allow such a feature > one may expect even primary to wait for all the cascading standby's > because otherwise still logical subscriber can be ahead of one of the > cascading standby. I feel even if we want to allow such a behaviour we > can do it later once the main feature is committed. I think it would > be good to just allow logical walsenders on primary to wait for > physical standbys represented by GUC 'standby_slot_names'. If we agree > on that then it would be good to prohibit setting this GUC on standby > or at least it should be a no-op even if this GUC should be set on > physical standby. > > Thoughts? IMHO, why not keep the behavior consistent across primary and standby? I mean if it doesn't require a lot of new code/design addition then it should be the user's responsibility. I mean if the user has set 'standby_slot_names' on standby then let standby also wait for cascading standby to sync their slots? Is there any issue with that behavior? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com