Re: [HACKERS] Hot standby, race condition between recovery snapshot and commit
Simon Riggs wrote: > On Sun, 2009-11-15 at 21:37 +0200, Heikki Linnakangas wrote: > >> Am I missing anything? > > Will review. Thanks! Please use the head of git branch, I already found one major oversight in what I posted that's fixed there... I should go to bed already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, race condition between recovery snapshot and commit
Oh, forgot to mention another thing that I've been pondering: Currently, the running-xacts record is written to the WAL after the checkpoint record. There's a small chance that you get an xlog switch in between. If that happens, it might take a long time after the checkpoint record until the standby sees the running-xacts record, so it might take a long time until the standby can open up for connections. In general, I'd like to remove as many as possible of those cases where the standby starts up, and can't open up for connections. It makes the standby a lot less useful if you can't rely on it being open. So I'd like to make it so that the standby can *always* open up. There's currently three cases where that can happen: 1. If the subxid cache has overflown. 2. If there's no running-xacts record after the checkpoint record for some reason. For example, one was written but not archive yet, or because the master crashed before it was written. 3. If too many AccessExclusiveLocks was being held. Case 3 should be pretty easy to handle. Just need to WAL log all the AccessExclusiveLocks, perhaps as separate WAL records (we already have a new WAL record type for logging locks) if we're worried about the running-xacts record growing too large. I think we could handle case 2 if we wrote the running-xacts record *before* the checkpoint record. Then it would be always between the REDO pointer of the checkpoint record, and the checkpoint record itself, so it would always be seen by the WAL recovery. To handle case 1, we could scan pg_subtrans. It would add some amount of code and would add some more work to taking the running-xacts snapshot, but it could be done. This isn't absolutely necessary for the first version, but it's something to keep in mind... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, race condition between recovery snapshot and commit
Simon Riggs wrote: > On Sat, 2009-11-14 at 14:59 +0200, Heikki Linnakangas wrote: >> I can't see any obvious way around that. > > Huh? We're only doing this strict locking approach because you insisted > that the looser approach was not acceptable. Take it easy, Simon. By obvious, I meant "trivial" or "easy". I believe you're referring to this (http://archives.postgresql.org/message-id/4a8ce561.4000...@enterprisedb.com): > If there's a way, I would prefer a solution where the RunningXacts > snapshot represents the situation the moment it appears in WAL, not some > moment before it. It makes the logic easier to understand. or did we have further discussion on that since? > Have you forgotten that > discussion so completely that you can't even remember the existence of > other options? I do remember that. I've been thinking about the looser approach a lot since yesterday. So, if we drop the notion that the running-xacts record represents the situation at the exact moment it appears in WAL, what do we have to change? Creating the running-xacts snapshot becomes easier, but when we replay it, we must take the snapshot with a grain of salt. 1. the snapshot can contain xids that have already finished (= we've already seen the commit/abort record) 2. the snapshot can lack xids belonging to transactions that have just started, between the window when the running-xacts snapshot is taken in the master and it's written to WAL. Problem 1 is quite easy to handle: just check every xid in clog. If it's marked there as finished already, it can be ignored. For problem 2, if a transaction hasn't written any WAL yet, we might as well treat it as not-yet-started in the standby, so we're concerned about transactions that have written a WAL record between when the running-xacts snapshot was taken and written to WAL. Assuming the snapshot was taken after the REDO pointer of the checkpoint record, the standby has seen the WAL record and therefore has all the information it needs. Currently, the standby doesn't add xids to known-assigned list until it sees the running-xacts record, but we could change that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot standby, race condition between recovery snapshot and commit
There's a race condition between transaction commit and GetRunningTransactionData(). If GetRunningTransactionData() runs between the RecordTransactionCommit() and ProcArrayEndTransaction() calls in CommitTransaction(): > /* >* Here is where we really truly commit. >*/ > latestXid = RecordTransactionCommit(false); > > TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid); > > /* >* Let others know about no transaction in progress by me. Note that > this >* must be done _before_ releasing locks we hold and _after_ >* RecordTransactionCommit. >*/ > ProcArrayEndTransaction(MyProc, latestXid); The running-xacts snapshot will include the transaction that's just committing, but the commit record will be before the running-xacts WAL record. If standby initializes transaction tracking from that running-xacts record, it will consider the just-committed transactions as still in-progress until the next running-xact record (at next checkpoint). I can't see any obvious way around that. We could have transaction commit acquire the new RecoveryInfoLock across those two calls, but I'd like to avoid putting any extra overhead into such a critical path. Hmm, actually ProcArrayApplyRecoveryInfo() could check every xid in the running-xacts record against clog. If it's marked as finished in clog already (because we already saw the commit/abort record before the running-xacts record), we know it's not running after all. Because of the sequence that commit removes entry from procarray and releases locks, it also seems possible for GetRunningTransactionsData() to acquire a snapshot that contains an AccessExclusiveLock for a transaction, but that XID is not listed as running in the XID list. That sounds like trouble too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers