On 7/24/23 12:40, Amit Kapila wrote: > On Wed, Jul 5, 2023 at 8:21 PM Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: >> >> 0005, 0006 and 0007 are all related to the initial sequence sync. [3] >> resulted in 0007 and I think we need it. That leaves 0005 and 0006 to >> be reviewed in this response. >> >> I followed the discussion starting [1] till [2]. The second one >> mentions the interlock mechanism which has been implemented in 0005 >> and 0006. While I don't have an objection to allowing LOCKing a >> sequence using the LOCK command, I am not sure whether it will >> actually work or is even needed. >> >> The problem described in [1] seems to be the same as the problem >> described in [2]. In both cases we see the sequence moving backwards >> during CATCHUP. At the end of catchup the sequence is in the right >> state in both the cases. >> > > I think we could see backward sequence value even after the catchup > phase (after the sync worker is exited and or the state of rel is > marked as 'ready' in pg_subscription_rel). The point is that there is > no guarantee that we will process all the pending WAL before > considering the sequence state is 'SYNCDONE' and or 'READY'. For > example, after copy_sequence, I see values like: > > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 165 | 0 | t > (1 row) > postgres=# select nextval('s'); > nextval > --------- > 166 > (1 row) > postgres=# select nextval('s'); > nextval > --------- > 167 > (1 row) > postgres=# select currval('s'); > currval > --------- > 167 > (1 row) > > Then during the catchup phase: > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 33 | 0 | t > (1 row) > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 66 | 0 | t > (1 row) > > postgres=# select * from pg_subscription_rel; > srsubid | srrelid | srsubstate | srsublsn > ---------+---------+------------+----------- > 16394 | 16390 | r | 0/16374E8 > 16394 | 16393 | s | 0/1637700 > (2 rows) > > postgres=# select * from pg_subscription_rel; > srsubid | srrelid | srsubstate | srsublsn > ---------+---------+------------+----------- > 16394 | 16390 | r | 0/16374E8 > 16394 | 16393 | r | 0/1637700 > (2 rows) > > Here Sequence relid id 16393. You can see sequence state is marked as ready. >
Right, but "READY" just means the apply caught up if the LSN where the sync finished ... > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 66 | 0 | t > (1 row) > > Even after that, see below the value of the sequence is still not > caught up. Later, when the apply worker processes all the WAL, the > sequence state will be caught up. > And how is this different from what tablesync does for tables? For that 'r' also does not mean it's fully caught up, IIRC. What matters is whether the sequence since this moment can go back. And I don't think it can, because that would require replaying changes from before we did copy_sequence ... > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 165 | 0 | t > (1 row) > > So, there will be a window where the sequence won't be caught up for a > certain period of time and any usage of it (even after the sync is > finished) during that time could result in inconsistent behaviour. > > The other question is whether it is okay to allow the sequence to go > backwards even during the initial sync phase? The reason I am asking > this question is that for the time sequence value moves backwards, one > is allowed to use it on the subscriber which will result in using > out-of-sequence values. For example, immediately, after copy_sequence > the values look like this: > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 133 | 32 | t > (1 row) > postgres=# select nextval('s'); > nextval > --------- > 134 > (1 row) > postgres=# select currval('s'); > currval > --------- > 134 > (1 row) > > But then during the sync phase, it can go backwards and one is allowed > to use it on the subscriber: > postgres=# select * from s; > last_value | log_cnt | is_called > ------------+---------+----------- > 66 | 0 | t > (1 row) > postgres=# select nextval('s'); > nextval > --------- > 67 > (1 row) > As I wrote earlier, I think the agreement was we make no guarantees about what happens during the sync. Also, not sure what you mean by "no one is allowed to use it on subscriber" - that is only allowed after a failover/switchover, after sequence sync completes. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company