Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.

The idea here is to calculate the lag between the primary and the standby
(Async?) server during XLogInsert and block the caller until the lag is
less than the threshold value. We can calculate the max lag by iterating
over ReplicationSlotCtl->replication_slots. If this is not something we
don't want to do in the core, at least adding a hook for XlogInsert is of
great value.

A few other scenarios I can think of with the hook are:

   1. Enforcing RPO as described above
   2. Enforcing rate limit and slow throttling when sync standby is falling
   behind (could be flush lag or replay lag)
   3. Transactional log rate governance - useful for cloud providers to
   provide SKU sizes based on allowed WAL writes.

Thoughts?

Thanks,
Satya

Reply via email to