A few thoughts on failover slots vs the alternative of pushing catalog_xmin up to the master via a replica's slot and creating independent slots on replicas.
Failover slots: --- + Failover slots are very easy for applications. They "just work" and are transparent for failover. This is great especially for things that aren't complex replication schemes, that just want to use logical decoding. + Applications don't have to know what replicas exist or be able to reach them; transparent failover is easier. - Failover slots can't be used from a cascading standby (where we can fail down to the standby's own replicas) because they have to write WAL to advance the slot position. They'd have to send the slot position update "up" to the master then wait to replay it. Not a disaster, though they'd do extra work on reconnect until a restart_lsn update replayed. Would require a whole new feedback-like message on the rep protocol, and couldn't work at all with archive replication. Ugly as hell. + Failover slots exist now, and could be added to 9.6. - The UI for failover slots can't be re-used for the catalog_xmin push-up approach to allow replay from failover slots on cascading standbys in 9.7+. There'd be no way to propagate the creation of failover slots "down" the replication heirarchy that way, especially to archive standbys like failover slots will do. So it'd be semantically different and couldn't re-use the FS UI. We'd be stuck with failover slots even if we also did the other way later. + Will work for recovery of a master PITR-restored up to the latest recovery point Independent slots on replicas + catalog_xmin push-up --- With this approach we allow creation of replication slots on a replica independently of the master. The replica is required to connect to the master via a slot. We send feedback to the master to advance the replica's slot on the master to the confirmed_lsn of the most-behind slot on the replica, therefore pinning master's catalog_xmin where needed. Or we just send a new feedback message type that directly sets a catalog_xmin on the replica's physical slot in the master. Slots are _not_ cloned from master to replica automatically. - More complicated for applications to use. They have to create a slot on each replica that might be failed over to as well as the master and have to advance all those slots to stop the master from suffering severe catalog bloat. (But see note below). - Applications must be able to connect to failover-candidate standbys and know where they are, it's not automagically handled via WAL. (But see note below). - Applications need reconfiguration whenever a standby is rebuilt, moved, etc. (But see note below). - Cannot work at all for archive-based replication, requires a slot from replica to master. + Works with replay from cascading standbys + Actually solves one of the problems making logical slots on standbys unsupported at the moment by giving us a way to pin the master's catalog_xmin to that needed by a replica. - Won't work for a standby PITR-restored up to latest. - Vapourware with zero hope for 9.6 Note: I think the application complexity issues can be solved - to a degree - by having the replicas run a bgworker based helper that connects to the master and clones the master's slots then advances them automatically. Do nothing --- Drop the idea of being able to follow physical failover on logical slots. I've already expressed why I think this is a terrible idea. It's hostile to application developers who'd like to use logical decoding. It makes integration of logical replication with existing HA systems much harder. It means we need really solid, performant, well-tested and mature logical rep based HA before we can take logical rep seriously, which is a long way out given that we can't do decoding of in-progress xacts, ddl, sequences, .... etc etc. Some kind of physical HA for logical slots is needed and will be needed for some time. Logical rep will be great for selective replication, replication over WAN, filtered/transformed replication etc. Physical rep is great for knowing you'll get exactly the same thing on the replica that you have on the master and it'll Just Work. In any case, "Do nothing" is the same for 9.6 as pursusing the catalog_xmin push-up idea; in both cases we don't commit anything in 9.6.