Simon Riggs wrote:
Taking snapshots from primary has a few disadvantages

 ...
      * snapshots on primary prevent row removal (but this was also an
        advantage of this technique!)

That makes it an awful solution for high availability. A backend hung in transaction-in-progress state in the slave will prevent row removal on the master. Isolating the master from queries done performed in the slave is exactly the reason why people use hot standby. And running long reporting queries in the standby is again a very typical use case.

And still we can't escape the scenario that the slave receives a WAL record that vacuums away a tuple that's still visible according to a snapshot used in the slave. Even with the proposed scheme, this can happen:

1. Slave receives a snapshot from master
2. A long-running transaction begins on the slave, using that snapshot
3. Network connection is lost
4. Master hits a timeout, and decides to discard the snapshot it sent to the slave
5. A tuple visible to the snapshot is vacuumed
6. Network connection is re-established
7. Slave receives the vacuum WAL record, even though the long-running transaction still needs the tuple.

I like the idea of acquiring snapshots locally in the slave much more. As you mentioned, the options there are to defer applying WAL, or cancel queries. I think both options need the same ability to detect when you're about to remove a tuple that's still visible to some snapshot, just the action is different. We should probably provide a GUC to control which you want.

However, if we still to provide the behavior that "as long as the network connection works, the master will not remove tuples still needed in the slave" as an option, a lot simpler implementation is to periodically send the slave's oldest xmin to master. Master can take that into account when calculating its own oldest xmin. That requires a lot less communication than the proposed scheme to send snapshots back and forth. A softer version of that is also possible, where the master obeys the slave's oldest xmin, but only up to a point.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to