On Sun, Oct 31, 2021 at 3:38 PM Peter Eisentraut
<peter.eisentr...@enterprisedb.com> wrote:
>
> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> rebased it, added a bit of testing.  It basically works, but as
> mentioned in [0], there are various issues to work out.
>
> The idea is that the standby runs a background worker to periodically
> fetch replication slot information from the primary.  On failover, a
> logical subscriber would then ideally find up-to-date replication slots
> on the new publisher and can just continue normally.
>
> The previous thread didn't have a lot of discussion, but I have gathered
> from off-line conversations that there is a wider agreement on this
> approach.  So the next steps would be to make it more robust and
> configurable and documented.  As I said, I added a small test case to
> show that it works at all, but I think a lot more tests should be added.
>   I have also found that this breaks some seemingly unrelated tests in
> the recovery test suite.  I have disabled these here.  I'm not sure if
> the patch actually breaks anything or if these are just differences in
> timing or implementation dependencies.  This patch adds a LIST_SLOTS
> replication command, but I think this could be replaced with just a
> SELECT FROM pg_replication_slots query now.  (This patch is originally
> older than when you could run SELECT queries over the replication protocol.)
>
> So, again, this isn't anywhere near ready, but there is already a lot
> here to gather feedback about how it works, how it should work, how to
> configure it, and how it fits into an overall replication and HA
> architecture.
>
>
> [0]:
> https://www.postgresql.org/message-id/flat/3095349b-44d4-bf11-1b33-7eefb585d578%402ndquadrant.com

Thanks for working on this patch. This feature will be useful as it
avoids manual intervention during the failover.

Here are some thoughts:
1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).

2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?

3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?

4) IIUC, the proposal works only for logical replication slots but do
you also see the need for supporting some kind of synchronization of
physical replication slots as well? IMO, we need a better and
consistent way for both types of replication slots. If the walsender
can somehow push the slot info from the primary (for physical
replication slots)/publisher (for logical replication slots) to the
standby/subscribers, this will be a more consistent and simplistic
design. However, I'm not sure if this design is doable at all.

Regards,
Bharath Rupireddy.


Reply via email to