On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Thursday, March 28, 2024 7:32 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> > > wrote: > > > > > > When analyzing one BF error[1], we find an issue of slotsync: Since > > > we don't perform logical decoding for the synced slots when syncing > > > the lsn/xmin of slot, no logical snapshots will be serialized to > > > disk. So, when user starts to use these synced slots after > > > promotion, it needs to re-build the consistent snapshot from the > > > restart_lsn if the > > > WAL(xl_running_xacts) at restart_lsn position indicates that there > > > are running transactions. This however could cause the data that > > > before the > > consistent point to be missed[2]. > > > > > > This issue doesn't exist on the primary because the snapshot at > > > restart_lsn should have been serialized to disk > > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the > > > logical decoding restarts, it can find consistent snapshot > > > immediately at > > restart_lsn. > > > > > > To fix this, we could use the fast forward logical decoding to > > > advance the synced slot's lsn/xmin when syncing these values instead > > > of directly updating the slot's info. This way, the snapshot will be > > > serialized to > > disk when decoding. > > > If we could not reach to the consistent point at the remote > > > restart_lsn, the slot is marked as temp and will be persisted once > > > it reaches the consistent point. I am still analyzing the fix and > > > will share once > > ready. > > > > > > > Yes, we can use this but one thing to note is that > > CreateDecodingContext() will expect that the slot's and current > > database are the same. I think the reason for that is we need to check > > system tables of the current database while decoding and sending data > > to the output_plugin which won't be a requirement for the fast_forward > > case. So, we need to skip that check in fast_forward mode. > > Agreed. > > > > > Next, I was thinking about the case of the first time updating the > > restart and confirmed_flush LSN while syncing the slots. I think we > > can keep the current logic as it is based on the following analysis. > > > > For each logical slot, cases possible on the primary: > > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet > > reached the consistent point. > > 2. The restart_lsn doesn't have a serialized snapshot but has reached > > a consistent point. > > 3. The restart_lsn has a serialized snapshot which means it has > > reached a consistent point as well. > > > > Considering we keep the logic to reserve initial WAL positions the > > same as the current (Reserve WAL for the currently active local slot > > using the specified WAL location (restart_lsn). If the given WAL > > location has been removed, reserve WAL using the oldest existing WAL > > segment.), I could think of the below > > scenarios: > > A. For 1, we shouldn't sync the slot as it still wouldn't have been > > marked persistent on the primary. > > B. For 2, we would sync the slot > > B1. If remote_restart_lsn >= local_resart_lsn, then advance the > > slot by calling pg_logical_replication_slot_advance(). > > B11. If we reach consistent point, then it should be okay > > because after promotion as well we should reach consistent point. > > B111. But again is it possible that there is some xact > > that comes before consistent_point on primary and the same is after > > consistent_point on standby? This shouldn't matter as we will start > > decoding transactions after confirmed_flush_lsn which would be the same on > primary and standby. > > B22. If we haven't reached consistent_point, then we won't mark > > the slot as persistent, and at the next sync we will do the same till > > it reaches consistent_point. At that time, the situation will be similar to > > B11. > > B2. If remote_restart_lsn < local_restart_lsn, then we will wait > > for the next sync cycle and keep the slot as temporary. Once in the > > next or some consecutive sync cycle, we reach the condition > > remote_restart_lsn >= local_restart_lsn, we will proceed to advance > > the slot and we should have the same behavior as B1. > > C. For 3, we would sync the slot, but the behavior should be the same as B. > > > > Thoughts? > > Looks reasonable to me. > > Here is the patch based on above lines. > I am also testing and verifying the patch locally.
Attach a new version patch which fixed an un-initialized variable issue and added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so that we can analyze the possible CFbot failures easily. Best Regards, Hou zj
v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description: v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch