Re: Reviving lost replication slots
On Thu, Nov 10, 2022 at 4:12 PM sirisha chamarthi wrote: > > On Wed, Nov 9, 2022 at 12:32 AM Kyotaro Horiguchi > wrote: >> >> I don't think walsenders fetching segment from archive is totally >> stupid. With that feature, we can use fast and expensive but small >> storage for pg_wal, while avoiding replciation from dying even in >> emergency. > > Thanks! If there is a general agreement on this in this forum, I would like > to start working on this patch, I think starting with establishing/summarizing the problem, design approaches, implications etc. is a better idea than a patch. It might invite more thoughts from the hackers. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Reviving lost replication slots
On Thu, Nov 10, 2022 at 4:07 PM sirisha chamarthi wrote: > > On Wed, Nov 9, 2022 at 2:37 AM Amit Kapila wrote: >> >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi >> wrote: >> > >> Is the intent of setting restart_lsn to InvalidXLogRecPtr was to >> disallow reviving the slot? >> > >> >> I think the intent is to compute the correct value for >> replicationSlotMinLSN as we use restart_lsn for it and using the >> invalidated slot's restart_lsn value for it doesn't make sense. > > > Correct. If a slot is invalidated (lost), then shouldn't we ignore the slot > from computing the catalog_xmin? I don't see it being set to > InvalidTransactionId in ReplicationSlotsComputeRequiredXmin. Attached a small > patch to address this and the output after the patch is as shown below. > I think you forgot to attach the patch. However, I suggest you start a separate thread for this because the patch you are talking about here seems to be for an existing problem. -- With Regards, Amit Kapila.
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 12:32 AM Kyotaro Horiguchi wrote: > I don't think walsenders fetching segment from archive is totally > stupid. With that feature, we can use fast and expensive but small > storage for pg_wal, while avoiding replciation from dying even in > emergency. > Thanks! If there is a general agreement on this in this forum, I would like to start working on this patch, > > At Tue, 8 Nov 2022 19:39:58 -0800, sirisha chamarthi < > sirichamarth...@gmail.com> wrote in > > > If it's a streaming replication slot, the standby will anyway jump to > > > archive mode ignoring the replication slot and the slot will never be > > > usable again unless somebody creates a new replication slot and > > > provides it to the standby for reuse. > > > If it's a logical replication slot, the subscriber will start to > > > diverge from the publisher and the slot will have to be revived > > > manually i.e. created again. > > > > > > > Physical slots can be revived with standby downloading the WAL from the > > archive directly. This patch is helpful for the logical slots. > > However, supposing that WalSndSegmentOpen() fetches segments from > archive as the fallback and that succeeds, the slot can survive > missing WAL in pg_wal in the first place. So this patch doesn't seem > to be needed for the purpose. > Agree on this. If we add the proposed support, we don't need this patch. > > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center >
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 2:37 AM Amit Kapila wrote: > On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > wrote: > > > Is the intent of setting restart_lsn to InvalidXLogRecPtr was to > disallow reviving the slot? > > > > I think the intent is to compute the correct value for > replicationSlotMinLSN as we use restart_lsn for it and using the > invalidated slot's restart_lsn value for it doesn't make sense. > Correct. If a slot is invalidated (lost), then shouldn't we ignore the slot from computing the catalog_xmin? I don't see it being set to InvalidTransactionId in ReplicationSlotsComputeRequiredXmin. Attached a small patch to address this and the output after the patch is as shown below. postgres=# select * from pg_replication_slots; slot_name |plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s2| test_decoding | logical | 5 | postgres | f | f || | 771 | 0/30466368 | 0/304663A0 | reserved | 28903824 | f (1 row) postgres=# create table t2(c int, c1 char(100)); CREATE TABLE postgres=# drop table t2; DROP TABLE postgres=# vacuum pg_class; VACUUM postgres=# select n_dead_tup from pg_stat_all_tables where relname = 'pg_class'; n_dead_tup 2 (1 row) postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_pri ority | sync_state | reply_time -+--+-+--+-+-+-+---+--+---+--+---+---++---+---++- --++ (0 rows) postgres=# insert into t1 select * from t1; INSERT 0 2097152 postgres=# checkpoint; CHECKPOINT postgres=# select * from pg_replication_slots; slot_name |plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ---+---+---++--+---+++--+--+-+-++---+--- s2| test_decoding | logical | 5 | postgres | f | f || | 771 | | 0/304663A0 | lost | | f (1 row) postgres=# vacuum pg_class; VACUUM postgres=# select n_dead_tup from pg_stat_all_tables where relname = 'pg_class'; n_dead_tup 0 (1 row) > > -- > With Regards, > Amit Kapila. >
Re: Reviving lost replication slots
On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi wrote: > Is the intent of setting restart_lsn to InvalidXLogRecPtr was to disallow reviving the slot? > I think the intent is to compute the correct value for replicationSlotMinLSN as we use restart_lsn for it and using the invalidated slot's restart_lsn value for it doesn't make sense. -- With Regards, Amit Kapila.
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 3:53 PM Amit Kapila wrote: > > On Wed, Nov 9, 2022 at 3:00 PM Bharath Rupireddy > wrote: > > > > On Wed, Nov 9, 2022 at 2:02 PM Kyotaro Horiguchi > > wrote: > > > > > > I don't think walsenders fetching segment from archive is totally > > > stupid. With that feature, we can use fast and expensive but small > > > storage for pg_wal, while avoiding replciation from dying even in > > > emergency. > > > > It seems like a useful feature to have at least as an option and it > > saves a lot of work - failovers, expensive rebuilds of > > standbys/subscribers, manual interventions etc. > > > > If you're saying that even the walsedners serving logical replication > > subscribers would go fetch from the archive location for the removed > > WAL files, it mandates enabling archiving on the subscribers. > > > > Why archiving on subscribers is required? Won't it be sufficient if > that is enabled on the publisher where we have walsender? Ugh. A typo. I meant it mandates enabling archiving on publishers. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 3:00 PM Bharath Rupireddy wrote: > > On Wed, Nov 9, 2022 at 2:02 PM Kyotaro Horiguchi > wrote: > > > > I don't think walsenders fetching segment from archive is totally > > stupid. With that feature, we can use fast and expensive but small > > storage for pg_wal, while avoiding replciation from dying even in > > emergency. > > It seems like a useful feature to have at least as an option and it > saves a lot of work - failovers, expensive rebuilds of > standbys/subscribers, manual interventions etc. > > If you're saying that even the walsedners serving logical replication > subscribers would go fetch from the archive location for the removed > WAL files, it mandates enabling archiving on the subscribers. > Why archiving on subscribers is required? Won't it be sufficient if that is enabled on the publisher where we have walsender? -- With Regards, Amit Kapila.
Re: Reviving lost replication slots
On Wed, Nov 9, 2022 at 2:02 PM Kyotaro Horiguchi wrote: > > I don't think walsenders fetching segment from archive is totally > stupid. With that feature, we can use fast and expensive but small > storage for pg_wal, while avoiding replciation from dying even in > emergency. It seems like a useful feature to have at least as an option and it saves a lot of work - failovers, expensive rebuilds of standbys/subscribers, manual interventions etc. If you're saying that even the walsedners serving logical replication subscribers would go fetch from the archive location for the removed WAL files, it mandates enabling archiving on the subscribers. And we know that the archiving is not cheap and has its own advantages and disadvantages, so the feature may or may not help. If you're saying that only the walsedners serving streaming replication standbys would go fetch from the archive location for the removed WAL files, it's easy to implement, however it is not a complete feature and doesn't solve the problem for logical replication. With the feature, it'll be something like 'you, as primary/publisher, archive the WAL files and when you don't have them, you'll restore them', it may not sound elegant, however, it can solve the lost replication slots problem. And, the cost of restoring WAL files from the archive might further slow down the replication thus increasing the replication lag. And, one need to think, how many such WAL files are restored and kept, whether they'll be kept in pg_wal or some other directory, how will the disk full, fetching too old or too many WAL files for replication slots lagging behind, removal of unnecessary WAL files etc. be handled. I'm not sure about other implications at this point of time. Perhaps, implementing this feature as a core/external extension by introducing segment_open() or other necessary hooks might be worth it. If implemented in some way, I think the scope of replication slot invalidation/max_slot_wal_keep_size feature gets reduced or it can be removed completely, no? > However, supposing that WalSndSegmentOpen() fetches segments from > archive as the fallback and that succeeds, the slot can survive > missing WAL in pg_wal in the first place. So this patch doesn't seem > to be needed for the purpose. That is a simple solution one can think of and provide for streaming replication standbys, however, is it worth implementing it in the core as explained above? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Reviving lost replication slots
I don't think walsenders fetching segment from archive is totally stupid. With that feature, we can use fast and expensive but small storage for pg_wal, while avoiding replciation from dying even in emergency. At Tue, 8 Nov 2022 19:39:58 -0800, sirisha chamarthi wrote in > > If it's a streaming replication slot, the standby will anyway jump to > > archive mode ignoring the replication slot and the slot will never be > > usable again unless somebody creates a new replication slot and > > provides it to the standby for reuse. > > If it's a logical replication slot, the subscriber will start to > > diverge from the publisher and the slot will have to be revived > > manually i.e. created again. > > > > Physical slots can be revived with standby downloading the WAL from the > archive directly. This patch is helpful for the logical slots. However, supposing that WalSndSegmentOpen() fetches segments from archive as the fallback and that succeeds, the slot can survive missing WAL in pg_wal in the first place. So this patch doesn't seem to be needed for the purpose. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Reviving lost replication slots
On Mon, Nov 7, 2022 at 11:17 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi > wrote: > > > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila > wrote: > >> > >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > >> wrote: > >> > > >> > A replication slot can be lost when a subscriber is not able to catch > up with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > >> > > >> > >> Why in the first place one has to set max_slot_wal_keep_size if they > >> care for WAL more than that? > > > > Disk full is a typical use where we can't wait until the logical slots > to catch up before truncating the log. > > If the max_slot_wal_keep_size is set appropriately and the replication > lag is monitored properly along with some automatic actions such as > replacing/rebuilding the standbys or subscribers (which may not be > easy and cheap though), the chances of hitting the "lost replication" > problem becomes less, but not zero always. > pg_dump and pg_restore can take several hours to days on a large database. Keeping the WAL in the pg_wal folder (faster, smaller and costly disks?) is not always an option. > > >> If you have a case where you want to > >> handle this case for some particular slot (where you are okay with the > >> invalidation of other slots exceeding max_slot_wal_keep_size) then the > >> other possibility could be to have a similar variable at the slot > >> level but not sure if that is a good idea because you haven't > >> presented any such case. > > > > IIUC, ability to fetch WAL from the archive as a fall back mechanism > should automatically take care of all the lost slots. Do you see a need to > take care of a specific slot? If the idea is not to download the wal files > in the pg_wal directory, they can be placed in a slot specific folder > (data/pg_replslot//) until they are needed while decoding and can be > removed. > > Is the idea here the core copying back the WAL files from the archive? > If yes, I think it is not something the core needs to do. This very > well fits the job of an extension or an external module that revives > the lost replication slots by copying WAL files from archive location. > The current code is throwing an error that the slot is lost because the restart_lsn is set to invalid LSN when the WAL is truncated by checkpointer. In order to build an external service that can revive a lost slot, at the minimum we needed the patch attached. > > Having said above, what's the best way to revive a lost replication > slot today? Any automated way exists today? It seems like > pg_replication_slot_advance() doesn't do anything for the > invalidated/lost slots. > If the WAL is available in the pg_wal directory, the replication stream resumes normally when the client connects with the patch I posted. > > If it's a streaming replication slot, the standby will anyway jump to > archive mode ignoring the replication slot and the slot will never be > usable again unless somebody creates a new replication slot and > provides it to the standby for reuse. > If it's a logical replication slot, the subscriber will start to > diverge from the publisher and the slot will have to be revived > manually i.e. created again. > Physical slots can be revived with standby downloading the WAL from the archive directly. This patch is helpful for the logical slots. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
Re: Reviving lost replication slots
On Tue, Nov 8, 2022 at 1:36 AM Amit Kapila wrote: > On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi > wrote: > > > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila > wrote: > >> > >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > >> wrote: > >> > > >> > A replication slot can be lost when a subscriber is not able to catch > up with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > >> > > >> > >> Why in the first place one has to set max_slot_wal_keep_size if they > >> care for WAL more than that? > > > > Disk full is a typical use where we can't wait until the logical slots > to catch up before truncating the log. > > > > Ideally, in such a case the subscriber should fall back to the > physical standby of the publisher but unfortunately, we don't yet have > a functionality where subscribers can continue logical replication > from physical standby. Do you think if we had such functionality it > would serve our purpose? > Don't think streaming from standby helps as the disk layout is expected to remain the same on physical standby and primary. > >> If you have a case where you want to > >> handle this case for some particular slot (where you are okay with the > >> invalidation of other slots exceeding max_slot_wal_keep_size) then the > >> other possibility could be to have a similar variable at the slot > >> level but not sure if that is a good idea because you haven't > >> presented any such case. > > > > IIUC, ability to fetch WAL from the archive as a fall back mechanism > should automatically take care of all the lost slots. Do you see a need to > take care of a specific slot? > > > > No, I was just trying to see if your use case can be addressed in some > other way. BTW, won't copying the WAL again back from archive can lead > to a disk full situation. > The idea is to download the WAL from archive on demand as the slot requires them and throw away the segment once processed. > > -- > With Regards, > Amit Kapila. >
Re: Reviving lost replication slots
On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi wrote: > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila wrote: >> >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi >> wrote: >> > >> > A replication slot can be lost when a subscriber is not able to catch up >> > with the load on the primary and the WAL to catch up exceeds >> > max_slot_wal_keep_size. When this happens, target has to be reseeded >> > (pg_dump) from the scratch and this can take longer. I am investigating >> > the options to revive a lost slot. >> > >> >> Why in the first place one has to set max_slot_wal_keep_size if they >> care for WAL more than that? > > Disk full is a typical use where we can't wait until the logical slots to > catch up before truncating the log. > Ideally, in such a case the subscriber should fall back to the physical standby of the publisher but unfortunately, we don't yet have a functionality where subscribers can continue logical replication from physical standby. Do you think if we had such functionality it would serve our purpose? >> If you have a case where you want to >> handle this case for some particular slot (where you are okay with the >> invalidation of other slots exceeding max_slot_wal_keep_size) then the >> other possibility could be to have a similar variable at the slot >> level but not sure if that is a good idea because you haven't >> presented any such case. > > IIUC, ability to fetch WAL from the archive as a fall back mechanism should > automatically take care of all the lost slots. Do you see a need to take care > of a specific slot? > No, I was just trying to see if your use case can be addressed in some other way. BTW, won't copying the WAL again back from archive can lead to a disk full situation. -- With Regards, Amit Kapila.
Re: Reviving lost replication slots
On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi wrote: > > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila wrote: >> >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi >> wrote: >> > >> > A replication slot can be lost when a subscriber is not able to catch up >> > with the load on the primary and the WAL to catch up exceeds >> > max_slot_wal_keep_size. When this happens, target has to be reseeded >> > (pg_dump) from the scratch and this can take longer. I am investigating >> > the options to revive a lost slot. >> > >> >> Why in the first place one has to set max_slot_wal_keep_size if they >> care for WAL more than that? > > Disk full is a typical use where we can't wait until the logical slots to > catch up before truncating the log. If the max_slot_wal_keep_size is set appropriately and the replication lag is monitored properly along with some automatic actions such as replacing/rebuilding the standbys or subscribers (which may not be easy and cheap though), the chances of hitting the "lost replication" problem becomes less, but not zero always. >> If you have a case where you want to >> handle this case for some particular slot (where you are okay with the >> invalidation of other slots exceeding max_slot_wal_keep_size) then the >> other possibility could be to have a similar variable at the slot >> level but not sure if that is a good idea because you haven't >> presented any such case. > > IIUC, ability to fetch WAL from the archive as a fall back mechanism should > automatically take care of all the lost slots. Do you see a need to take care > of a specific slot? If the idea is not to download the wal files in the > pg_wal directory, they can be placed in a slot specific folder > (data/pg_replslot//) until they are needed while decoding and can be > removed. Is the idea here the core copying back the WAL files from the archive? If yes, I think it is not something the core needs to do. This very well fits the job of an extension or an external module that revives the lost replication slots by copying WAL files from archive location. Having said above, what's the best way to revive a lost replication slot today? Any automated way exists today? It seems like pg_replication_slot_advance() doesn't do anything for the invalidated/lost slots. If it's a streaming replication slot, the standby will anyway jump to archive mode ignoring the replication slot and the slot will never be usable again unless somebody creates a new replication slot and provides it to the standby for reuse. If it's a logical replication slot, the subscriber will start to diverge from the publisher and the slot will have to be revived manually i.e. created again. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Reviving lost replication slots
Hi Amit, Thanks for your comments! On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila wrote: > On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi > wrote: > > > > A replication slot can be lost when a subscriber is not able to catch up > with the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > > > > Why in the first place one has to set max_slot_wal_keep_size if they > care for WAL more than that? Disk full is a typical use where we can't wait until the logical slots to catch up before truncating the log. If you have a case where you want to > handle this case for some particular slot (where you are okay with the > invalidation of other slots exceeding max_slot_wal_keep_size) then the > other possibility could be to have a similar variable at the slot > level but not sure if that is a good idea because you haven't > presented any such case. > IIUC, ability to fetch WAL from the archive as a fall back mechanism should automatically take care of all the lost slots. Do you see a need to take care of a specific slot? If the idea is not to download the wal files in the pg_wal directory, they can be placed in a slot specific folder (data/pg_replslot//) until they are needed while decoding and can be removed. > > -- > With Regards, > Amit Kapila. >
Re: Reviving lost replication slots
On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi wrote: > > A replication slot can be lost when a subscriber is not able to catch up with > the load on the primary and the WAL to catch up exceeds > max_slot_wal_keep_size. When this happens, target has to be reseeded > (pg_dump) from the scratch and this can take longer. I am investigating the > options to revive a lost slot. > Why in the first place one has to set max_slot_wal_keep_size if they care for WAL more than that? If you have a case where you want to handle this case for some particular slot (where you are okay with the invalidation of other slots exceeding max_slot_wal_keep_size) then the other possibility could be to have a similar variable at the slot level but not sure if that is a good idea because you haven't presented any such case. -- With Regards, Amit Kapila.