Re: Improving tracking/processing of buildfarm test failures

2024-05-24 Thread Amit Kapila
g about this. It makes sense to start with something simple and see how it works. I think this can also help us whether we need to chase a particular BF failure immediately after committing. -- With Regards, Amit Kapila.

Re: speed up a logical replica setup

2024-05-23 Thread Amit Kapila
On Thu, May 23, 2024 at 8:43 PM Euler Taveira wrote: > > On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote: > > > Why in the first place do we need to ensure that primary_slot_name is > active on the primary? You mentioned something related to WAL > retention

Re: speed up a logical replica setup

2024-05-23 Thread Amit Kapila
On Wed, May 22, 2024 at 8:46 PM Euler Taveira wrote: > > On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote: > > > v2-0002: not changed > > > > We have added more tries to see if the primary_slot_name becomes > active but I think it is still fragile because it

Re: State of pg_createsubscriber

2024-05-23 Thread Amit Kapila
On Wed, May 22, 2024 at 8:02 PM Robert Haas wrote: > > On Mon, May 20, 2024 at 2:42 AM Amit Kapila wrote: > > Just to summarize, apart from BF failures for which we had some > > discussion, I could recall the following open points: > > > > 1. After promotion, the pr

Re: State of pg_createsubscriber

2024-05-23 Thread Amit Kapila
e where it can be helpful for her. For example, retaining publications can help in creating a bi-directional setup. -- With Regards, Amit Kapila.

Re: State of pg_createsubscriber

2024-05-22 Thread Amit Kapila
tion in this tool was to avoid specifying slots, pubs, etc. for each database. See [1]. We can probably leave if the same is not important but we never reached the conclusion of same. [1] - https://www.postgresql.org/message-id/CAA4eK1%2Br96SyHYHx7BaTtGX0eviqpbbkSu01MEzwV5b2VFXP6g%40mail.gmail.com -- With Regards, Amit Kapila.

Re: speed up a logical replica setup

2024-05-22 Thread Amit Kapila
.org/message-id/CAA4eK1JJq_ER6Kq_H%3DjKHR75QPRd8y9_D%3DRtYw%3DaPYKMfqLi9A%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1LT3Z13Dg6p4Z%2B4adO_EY-ow5CmWfikEmBfL%3DeVrm8CPw%40mail.gmail.com -- With Regards, Amit Kapila.

Re: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-20 Thread Amit Kapila
On Fri, May 17, 2024 at 5:25 AM Michael Paquier wrote: > > On Thu, May 16, 2024 at 09:00:47AM +0530, Amit Kapila wrote: > > This can only be a problem if the apply worker generates more LOG > > entries with the required context but it won't do that unless there i

Re: speed up a logical replica setup

2024-05-20 Thread Amit Kapila
like that on my own local Linux + EXEC_BACKEND test run > (sorry I didn't keep the details around). Coincidence? > AFAICS, this is the same as one of the two BF failures being discussed in this thread. -- With Regards, Amit Kapila.

Re: State of pg_createsubscriber

2024-05-20 Thread Amit Kapila
-b277-4c13-850a-8ccc04de1406%40eisentraut.org#152dacecfc8f0cf08cbd8ecb79d6d38f [3] - https://www.postgresql.org/message-id/CAA4eK1KdCb%2B5sjYu6qCMXXdCX1y_ihr8kFzMozq0%3DP%3DauYxgog%40mail.gmail.com [4] - https://www.postgresql.org/message-id/flat/3fa9ef0f-b277-4c13-850a-8ccc04de1406%40eisentraut.org#152dacecfc8f0cf08cbd8ecb79d6d38f -- With Regards, Amit Kapila.

Re: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-15 Thread Amit Kapila
On Thu, May 16, 2024 at 3:43 AM Michael Paquier wrote: > > On Wed, May 15, 2024 at 05:58:18PM +0530, Amit Kapila wrote: > > I guess it could be more work if we want to enhance the test for > > ERRORs other than the primary key violation. > > And? You could pass t

Re: First draft of PG 17 release notes

2024-05-15 Thread Amit Kapila
n chunks up to io_combine_limit". > > Yes, my point is that it is not the number of system calls or system > call overhead that is the advantage of this patch, but the ability to > request more of the I/O system in one call, which is not the same as > system calls. > > I can use your wording, but how much prefetching to we enable now? > Shouldn't we need to include commit b5a9b18cd0bc6f0124664999b31a00a264d16913 with this item? -- With Regards, Amit Kapila.

Re: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-15 Thread Amit Kapila
below line in the test: # Check replicated data my $res = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl"); is($res, $expected, $msg); -- With Regards, Amit Kapila.

Re: Control flow in logical replication walsender

2024-05-07 Thread Amit Kapila
it, they are transferred to topmost transaction's RB. > I don't think they are transferred to topmost transaction's RB. We perform a k-way merge between transactions/subtransactions to retrieve the changes. See comments: "Support for efficiently iterating over a transaction's and its subtransactions' changes..." in reorderbuffer.c -- With Regards, Amit Kapila.

Re: Control flow in logical replication walsender

2024-05-07 Thread Amit Kapila
d a mitigation. > In PG-14, we have added a feature in logical replication to stream long in-progress transactions which should reduce spilling to a good extent. You might want to try that. -- With Regards, Amit Kapila.

Re: TerminateOtherDBBackends code comments inconsistency.

2024-05-06 Thread Amit Kapila
> to bgworkers whether or not they set proc->roleId. > Agreed. -- With Regards, Amit Kapila.

Re: speed up a logical replica setup

2024-04-30 Thread Amit Kapila
On Tue, Apr 30, 2024 at 12:04 PM Amit Kapila wrote: > > On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila wrote: > > > > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > > > > I was trying to test this utility when 'sync_replication_slots' is on > and it ge

Re: speed up a logical replica setup

2024-04-30 Thread Amit Kapila
On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila wrote: > > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > I was trying to test this utility when 'sync_replication_slots' is on and it gets in an ERROR loop [1] and never finishes. Please find the postgresql.auto used on the standby

Re: TerminateOtherDBBackends code comments inconsistency.

2024-04-29 Thread Amit Kapila
On Tue, Apr 30, 2024 at 2:58 AM Noah Misch wrote: > > On Mon, Apr 29, 2024 at 10:18:35AM +0530, Amit Kapila wrote: > > On Mon, Apr 22, 2024 at 9:56 PM Noah Misch wrote: > > > 3a9b18b309 didn't change the docs of pg_terminate_backend and whatever > > is mentioned w

Re: speed up a logical replica setup

2024-04-29 Thread Amit Kapila
On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > > On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote: > > On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira wrote: > > > > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote: > > > > Perhaps I'm missing som

Re: speed up a logical replica setup

2024-04-29 Thread Amit Kapila
MPTS (that was renamed to > NUM_ATTEMPTS in the first patch). See second patch. > How can we guarantee that the slot can become active after these many attempts? It still could be possible that on some slow machines it didn't get activated even after NUM_ATTEMPTS. BTW, in the first place, why

Re: speed up a logical replica setup

2024-04-29 Thread Amit Kapila
ther than indirectly verifying the same (by checking pg_stat_wal_receiver) as we are doing currently. -- With Regards, Amit Kapila.

Re: TerminateOtherDBBackends code comments inconsistency.

2024-04-28 Thread Amit Kapila
On Mon, Apr 22, 2024 at 9:56 PM Noah Misch wrote: > > On Mon, Apr 15, 2024 at 11:17:54AM +0530, Amit Kapila wrote: > > On Thu, Apr 11, 2024 at 6:58 PM Kirill Reshke > > wrote: > > > > > > While working on [0] i have noticed this comment in >

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-28 Thread Amit Kapila
as active, > > which could confuse users. Do we want to somehow deal with it? > > Yes. As long as the temporary slot is lying unused holding up > resources for more than the specified > replication_slot_inactive_timeout, it is bound to get invalidated. > This keeps behaviour consistent and less-confusing to the users. > Agreed. We may want to add something in the docs for this to avoid confusion with the active flag. -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-25 Thread Amit Kapila
On Wed, Apr 24, 2024 at 10:28 AM shveta malik wrote: > > Modified the logic to remove only synced temporary slots during > SQL-function exit. > > Please find v11 with above changes. > LGTM, so pushed! -- With Regards, Amit Kapila.

Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-25 Thread Amit Kapila
h, > > v7-0001-Table-sync-missed-due-to-race-condition-in-subscr_PG15.patch > > applies for PG15 branch. > > Thanks, I have verified that the patches can be applied cleanly and fix the > issue on each branch. The regression test can also pass after applying the > patch > on my machine. > Pushed. -- With Regards, Amit Kapila.

Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread Amit Kapila
On Tue, Apr 23, 2024 at 4:53 PM Amit Kapila wrote: > > On Wed, Mar 13, 2024 at 11:59 AM vignesh C wrote: > > > > On Wed, 13 Mar 2024 at 10:12, Zhijie Hou (Fujitsu) > > wrote: > > > > > > > > > For 0002, instead of avoid resetting the latch,

Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread Amit Kapila
doesn't fit > > well. > > Thanks for reviewing the patch, the attached v6 patch has the changes > for the same. > v6_0001* looks good to me. This should be backpatched unless you or others think otherwise. -- With Regards, Amit Kapila.

Re: Disallow changing slot's failover option in transaction block

2024-04-23 Thread Amit Kapila
> > > comments. > > Thanks! > > > Tested the patch, works well. > > Same here. > Pushed! -- With Regards, Amit Kapila.

Re: subscription/026_stats test is intermittently slow?

2024-04-23 Thread Amit Kapila
the issues reported in the thread you referred to has the same symptoms [1]. I'll review and analyze your proposal. [1] - https://www.postgresql.org/message-id/858a7622-2c81-1687-d1df-1322dfcb2e72%40gmail.com -- With Regards, Amit Kapila.

Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-23 Thread Amit Kapila
to wake up WaitForReplicationWorkerAttach() say condition variable? The current proposal can fix the issue but looks bit adhoc. -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-22 Thread Amit Kapila
main as-is after promotion and we need to document for users to remove such slots. Now, we can do that if we want but I think it is better to clean up such slots rather than putting the onus on users to remove them after promotion. -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-22 Thread Amit Kapila
On Fri, Apr 19, 2024 at 1:52 PM shveta malik wrote: > > Please find v9 with the above comments addressed. > I have made minor modifications in the comments and a function name. Please see the attached top-up patch. Apart from this, the patch looks good to me. -- With Regards, Amit Kap

Re: Disallow changing slot's failover option in transaction block

2024-04-17 Thread Amit Kapila
ounds logical to me and consistent with ALTER SUBSCRIPTION. BTW, IIUC, you are referring to: "When altering the slot_name, the failover and two_phase property values of the named slot may differ from the counterpart failover and two_phase parameters specified in the subscription. When creating the slot, ensure the slot properties failover and two_phase match their counterpart parameters of the subscription." in docs [1], right? -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-16 Thread Amit Kapila
On Tue, Apr 16, 2024 at 12:03 PM Bertrand Drouvot wrote: > > On Tue, Apr 16, 2024 at 08:21:04AM +0530, Amit Kapila wrote: > > > There is no clear use case for allowing them in parallel and I feel it > > would add more confusion when it can work sometimes but not other >

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-16 Thread Amit Kapila
he publisher .) to solve this problem? [1] - https://www.postgresql.org/message-id/CAA4eK1K1fSkeK%3Dkc26G5cq87vQG4%3D1qs_b%2Bno4%2Bep654SeBy1w%40mail.gmail.com -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-15 Thread Amit Kapila
On Mon, Apr 15, 2024 at 7:47 PM Bertrand Drouvot wrote: > > On Mon, Apr 15, 2024 at 06:29:49PM +0530, Amit Kapila wrote: > > On Mon, Apr 15, 2024 at 6:21 PM Bertrand Drouvot > > wrote: > > > > > > On Mon, Apr 15, 2024 at 03:38:53PM +0530, shveta malik wr

Re: promotion related handling in pg_sync_replication_slots()

2024-04-15 Thread Amit Kapila
On Mon, Apr 15, 2024 at 6:21 PM Bertrand Drouvot wrote: > > On Mon, Apr 15, 2024 at 03:38:53PM +0530, shveta malik wrote: > > On Mon, Apr 15, 2024 at 2:29 PM Amit Kapila wrote: > > > > > > On Fri, Apr 12, 2024 at 5:25 PM shveta malik > > > wrote: > &g

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-15 Thread Amit Kapila
same as >confirmed_flush and records behind this won't be decoded. > I don't understand the exact problem you are facing. IIUC, if the commit is after start_decoding_at point and prepare was before it, we expect to send the entire transaction followed by a commit record. The restart_lsn should be before the start of such a transaction and we should have recorded the changes in the reorder buffer. -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-15 Thread Amit Kapila
and SQL function pg_sync_replication_slots. */ typedef struct SlotSyncCtxStruct { pid_t pid; bool stopSignaled; bool syncing; time_t last_start_time; slock_t mutex; } SlotSyncCtxStruct; I feel the above comment is no longer valid after this patch. We can probably remove this altogether. -- With Regards, Amit Kapila.

Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)

2024-04-15 Thread Amit Kapila
n, ReorderBufferTXN *subtxn) { - Assert(subtxn->toplevel_xid == txn->xid); Is there a benefit in converting this assertion using toptxn? -- With Regards, Amit Kapila.

Re: TerminateOtherDBBackends code comments inconsistency.

2024-04-14 Thread Amit Kapila
[1] for the reason given in the commit message. I have added Noah to see if he has any suggestions on this matter. [1] - commit 3a9b18b3095366cd0c4305441d426d04572d88c1 Author: Noah Misch Date: Mon Nov 6 06:14:13 2023 -0800 Ban role pg_signal_backend from more superuser backend types. -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-12 Thread Amit Kapila
spinlock for synced flag. > > If the sync via SQL function is already half-way, then promotion > should wait for it to finish. I don't think it is a good idea to move > the check to synchronize_one_slot(). The sync-call should either not > start (if it noticed the promotion) or finish the sync and then let > promotion proceed. But I would like to know others' opinion on this. > Agreed. -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-12 Thread Amit Kapila
On Fri, Apr 12, 2024 at 7:47 AM shveta malik wrote: > > On Sat, Apr 6, 2024 at 11:49 AM Amit Kapila wrote: > > > > > > Few comments: > > == > > 1. > > void > > SyncReplicationSlots(WalReceiverConn *wrconn) > > { > > + /

Re: pg_upgrde failed : logical replication : alter_subscription_add_log

2024-04-12 Thread Amit Kapila
1]. [1] - https://www.postgresql.org/docs/devel/logical-replication.html -- With Regards, Amit Kapila

Re: Synchronizing slots from primary to standby

2024-04-11 Thread Amit Kapila
On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, April 11, 2024 12:11 PM Amit Kapila > wrote: > > > > > 2. > > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > > + if (remote_slot->confirmed_lsn < slot->

Re: pg_upgrde failed : logical replication : alter_subscription_add_log

2024-04-11 Thread Amit Kapila
g_name" > "text", "log_to_file" boolean, "log_to_table" "regclass", "conflict_type" > "text"[], "conflict_resolution" "text"[]); > > Am I missing any packages? > We don't maintain pglogical so difficult to answer but looking at the error (ERROR: could not find function "pglogical_alter_subscription_add_log" in file "/usr/pgsql-15/lib/pglogical.so"), it seems that the required function is not present in pglogical.so. It is possible that the arguments would have changed in newer version of pglogical or something like that. You need to check with the maintainers of pglogical. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-10 Thread Amit Kapila
On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, April 4, 2024 5:37 PM Amit Kapila > wrote: > > > > BTW, while thinking on this one, I > > noticed that in the function LogicalConfirmReceivedLocation(), we first > > update

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-10 Thread Amit Kapila
On Fri, Apr 5, 2024 at 4:59 PM Ajin Cherian wrote: > > On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila wrote: >> >> >> I think this would probably be better than the current situation but >> can we think of a solution to allow toggling the value of two_phase >

Re: Is this a problem in GenericXLogFinish()?

2024-04-10 Thread Amit Kapila
sh_freeovflpage(). The alternative could be: "Ensure that the required flags are set when there are no tuples. See _hash_freeovflpage().". I am also fine if you prefer to go with your proposed comment. Otherwise, the patch looks good to me. -- With Regards, Amit Kapila.

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Amit Kapila
pends on the binaryheap > changes yet either. > > - It seems straightforward to repeat the performance tests with whatever > alternative implementations we want to consider. > > My #1 choice would be to write a patch to switch the pairing heap, > performance test that, and revert the binary heap changes. > +1. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila
sted them > together. > > Here is a small patch to improve the test. > LGTM. I'll push this tomorrow morning unless there are any more comments or suggestions. -- With Regards, Amit Kapila.

Re: pgsql: Fix the intermittent buildfarm failures in 040_standby_failover_

2024-04-08 Thread Amit Kapila
ced in commit ddd5f4f54a. d13ff82319 Fix BF failure in commit 93db6cbda0. 6f3d8d5e7c Fix the intermittent buildfarm failures in 040_standby_failover_slots_sync. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila
On Mon, Apr 8, 2024 at 9:49 PM Andres Freund wrote: > > On 2024-04-08 16:01:41 +0530, Amit Kapila wrote: > > Pushed. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27 > > This unfortunately is a commit after > Right, and tha

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila
On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) wrote: > > On Saturday, April 6, 2024 12:43 PM Amit Kapila > wrote: > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > > wrote: > > > > Yeah, that could be the first step. We can probably add an in

Re: Synchronizing slots from primary to standby

2024-04-07 Thread Amit Kapila
On Sun, Apr 7, 2024 at 3:06 AM Andres Freund wrote: > > On 2024-04-06 10:58:32 +0530, Amit Kapila wrote: > > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > > > > > > > There are still a few pending issues to be fixed in this feature but > > otherwise,

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-06 Thread Amit Kapila
nce for the same? If so, we need some comments to explain the same. Can we avoid introducing the new functions like SaveGivenReplicationSlot() and MarkGivenReplicationSlotDirty(), if we do the required work in the caller? -- With Regards, Amit Kapila.

Re: promotion related handling in pg_sync_replication_slots()

2024-04-06 Thread Amit Kapila
Assert(SlotIsLogical(s)); + Assert(s->active_pid == 0); We can add a comment like: "The slot must not be acquired by any process" -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > There are still a few pending issues to be fixed in this feature but otherwise, we have committed all the main patches, so I marked the CF entry corresponding to this work as committed. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot wrote: > > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > Thinking more on this, it doesn't seem related to > > c9920a9068eac2e6c8fb34988d18c0b42b9bf81

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > There is an intermittent BF failure observed at [1] after this commit > > (2ec005b). > > > > Thanks for analyzing and providing the patch. I'll

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
uld c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At this stage, I am not sure so just sharing with others to see if what I am saying sounds logical. I'll think more about this. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2024-04-05%2004%3A34%3A27 -- With Regards, Amit Kapila.

Re: Is this a problem in GenericXLogFinish()?

2024-04-04 Thread Amit Kapila
On Fri, Apr 5, 2024 at 9:56 AM Michael Paquier wrote: > > On Wed, Feb 07, 2024 at 06:42:21PM +0900, Michael Paquier wrote: > > On Wed, Feb 07, 2024 at 02:08:42PM +0530, Amit Kapila wrote: > > > Thanks for the report and looking into it. Pushed! > > > > Thanks A

Re: promotion related handling in pg_sync_replication_slots()

2024-04-04 Thread Amit Kapila
pts to implement the same. > Thanks for the report and patch. I'll look into it. -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-04 Thread Amit Kapila
On Thu, Apr 4, 2024 at 11:12 AM Bharath Rupireddy wrote: > > On Thu, Apr 4, 2024 at 10:48 AM Amit Kapila wrote: > > > > The v33-0001 looks good to me. I have made minor changes in the > > comments/commit message and removed one part of the test which was a > >

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Amit Kapila
vedLocation(), we first update the disk copy, see comment [1] and then in-memory whereas the same is not true in update_local_synced_slot() for the case when snapshot exists. Now, do we have the same risk here in case of standby? Because I think we will use these xmins while sending the feedback message (in XLogWalRcvSendHSFeedback()). [1] /* * We have to write the changed xmin to disk *before* we change * the in-memory value, otherwise after a crash we wouldn't know * that some catalog tuples might have been removed already. -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-04 Thread Amit Kapila
hours > > passed, the standby is promoted without a restart. If we don't set > > inactive_since to current time in this case in ShutDownSlotSync, the > > inactive timeout invalidation mechanism can kick in immediately. > > > > Thank you for the explanation! I understood the needs. > Do you want to review the v34_0001* further or shall I proceed with the commit of the same? -- With Regards, Amit Kapila.

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-03 Thread Amit Kapila
nsactions are present? Can you please summarize the reason for the problems in doing that and the solutions, if any? -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Amit Kapila
ive_since TAP tests. > The v33-0001 looks good to me. I have made minor changes in the comments/commit message and removed one part of the test which was a bit confusing and didn't seem to add much value. Let me know what you think of the attached? -- With Regards, Amit Kapila. v34-0001-Allow-synced-slots-to-have-their-inactive_since.patch Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Amit Kapila
as well? Similarly, please check other tests. 2. +=item $node->get_slot_inactive_since_value(self, slot_name, reference_time) + +Get inactive_since column value for a given replication slot validating it +against optional reference time. + +=cut + +sub get_slot_inactive_since_value I see that all callers validate against reference time. It is better to name it validate_slot_inactive_since rather than using get_* as the main purpose is to validate the passed value. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-03 Thread Amit Kapila
On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > wrote: > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > +if (DecodingContextReady(ctx

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-03 Thread Amit Kapila
expect any other concurrent slot activity. The first reason seems good enough. One other observation: --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -42,6 +42,7 @@ #include "access/transam.h" #include "access/xlog_internal.h" #include "access/xlogrecovery.h" +#include "access/xlogutils.h" Is there a reason for this inclusion? I don't see any change which should need this one. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila
On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy wrote: > > On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote: > > > > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr > > > moveto, bool *found_consistent_snapshot) to > > > pg_logical_rep

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila
s safe? > It is not specific to slot sync worker but specific to fast_forward mode. There is already a comment "We need to access the system tables during decoding to build the logical changes unless we are in fast_forward mode where no changes are generated." telling why it is safe. The point is we need database access to access system tables while generating the logical changes and in fast-forward mode, we don't generate logical changes so this check is not required. Do let me if you have a different understanding or if my understanding is incorrect. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila
other idea to make such tests predictable is to add a developer-specific GUC say debug_bg_log_standby_snapshot or something like that but injection point sounds like a better idea. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila
On Mon, Apr 1, 2024 at 6:58 PM Bertrand Drouvot wrote: > > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > > wrote: > > > Then there is no need to call WaitForStandbyConfirmation()

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila
ugh for now (currently there is only one sync worker so that > scenario > is likely to happen). > Yeah, we could do that but I am not sure how much it can help. I guess we could do some tests to see if it helps. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila
On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila wrote: > > > > > > > > > 2. > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto, > > + bool *found_consistent_point

Re: Synchronizing slots from primary to standby

2024-03-31 Thread Amit Kapila
> SESSION1, TXN1 > BEGIN; > create table dummy2(a int); > > SESSION2, TXN2 > SELECT pg_log_standby_snapshot(); > > SESSION1, TXN1 > COMMIT; > > ./psql -d postgres -p 5432 -c "SELECT > pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());" > After this step and before the next, did you ensure that the slot sync has synced the latest confirmed_flush/restart LSNs? You can query: "select slot_name,restart_lsn, confirmed_flush_lsn from pg_replication_slots;" to ensure the same on both the primary and standby. -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-31 Thread Amit Kapila
On Fri, Mar 29, 2024 at 6:17 PM Bertrand Drouvot wrote: > > On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote: > > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot > > wrote: > > > > > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote: &

Re: Improve eviction algorithm in ReorderBuffer

2024-03-29 Thread Amit Kapila
n finding the largest transaction whether there are top-level or top-level plus subtransactions? This comment indicates it is only effective when there are subtransactions. -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-29 Thread Amit Kapila
On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot wrote: > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote: > > > > Commit message states: "why we can't just update inactive_since for > > synced slots on the standby with the value received from r

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila
ent point, so that user can ensure the slot can > > be > > used after promotion once persisted. > > Right, but do we need to do so for all the sync slots? Would a single hidden > slot be enough? > Even if we mark one of the synced slots as persistent without reaching a consistent state, it could create a problem after promotion. And, how a single hidden slot would serve the purpose, different synced slots will have different restart/confirmed_flush LSN and we won't be able to perform advancing for those using a single slot. For example, say for first synced slot, it has not reached a consistent state and then how can it try for the second slot? This sounds quite tricky to make work. We should go with something simple where the chances of introducing bugs are lesser. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila
t makes more consistent to call requiredXmin() as well, per [2]: > Yeah, I also think it is okay to call for the sake of consistency with pg_replication_slot_advance(). -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila
nessForDecoding(moveto, ready_for_decoding) with the same functionality as your patch has for pg_logical_replication_slot_advance() and then invoke it both from pg_logical_replication_slot_advance and slotsync.c. The function name is too big, we can think of a shorter name. Any ideas? -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-28 Thread Amit Kapila
ring the slot we can have some minimal pre-checks to ensure whether we need to update the slot or not. [1] - https://www.postgresql.org/message-id/OS0PR01MB571615D35F486080616CA841943A2%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.

Re: Support logical replication of DDLs

2024-03-28 Thread Amit Kapila
ld close either Returned With Feedback or Withdrawn as this requires a lot of work. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Amit Kapila
can we think of a better way to fix this issue? > Should we create a 17 open item [1]? > > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items > Yes, we can do that. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Amit Kapila
r as B1. C. For 3, we would sync the slot, but the behavior should be the same as B. Thoughts? -- With Regards, Amit Kapila.

Re: pg_upgrade and logical replication

2024-03-27 Thread Amit Kapila
ter enabling the subscription. Thanks -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Amit Kapila
? > Maybe we should prevent that otherwise some of the slots are synced > and the standby gets promoted while others are yet-to-be-synced. > We should do something about it but that shouldn't be done in this patch. We can handle it separately and then add such an assert. -- With Regards, Amit Kapila.

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-26 Thread Amit Kapila
rver end due to such slots. -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Amit Kapila
quot;reset" is the right word, what about > slot_sync_shutdown_update()? > *shutdown_update() sounds generic. How about update_synced_slots_inactive_time()? I think it is a bit longer but conveys the meaning. -- With Regards, Amit Kapila.

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-26 Thread Amit Kapila
On Tue, Mar 26, 2024 at 2:11 PM Alvaro Herrera wrote: > > On 2024-Mar-26, Amit Kapila wrote: > > > On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera > > wrote: > > > On 2024-Mar-26, Amit Kapila wrote: > > > > I would also like to solicit your opinion on

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-26 Thread Amit Kapila
On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera wrote: > > On 2024-Mar-26, Amit Kapila wrote: > > > We have a consensus on inactive_since, so I'll make that change. > > Sounds reasonable. So this is a timestamptz if the slot is inactive, > NULL if active, right? > Yes.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Amit Kapila
But why? This is exactly what we discussed in another thread where we agreed to update inactive_since even for sync slots. In each sync cycle, we acquire/release the slot, so the inactive_since gets updated. See synchronize_one_slot(). -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Amit Kapila
quiring spinlock which is unacceptable. Now, this may not happen in practice as the callers won't pass such a combination but still, this functionality should be improved. -- With Regards, Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Amit Kapila
it as a GUC also has some valid use cases as pointed out by you but I am not sure having both at slot level and at GUC level is required. I was a bit inclined to have it at slot level for now and then based on some field usage report we can later add GUC as well. -- With Regards, Amit Kapila.

Re: speed up a logical replica setup

2024-03-25 Thread Amit Kapila
On Tue, Mar 26, 2024 at 8:27 AM Euler Taveira wrote: > > On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote: > > On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut wrote: > > > > I have committed your version v33. I did another pass over the > > identifier and lit

Re: speed up a logical replica setup

2024-03-25 Thread Amit Kapila
objects should be removed (either optionally or always), otherwise, it can lead to a new subscriber not being able to restart or getting some unwarranted data. [1] - https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com -- With Regards, Amit Kapila.

  1   2   3   4   5   6   7   8   9   10   >