Andres Freund <and...@2ndquadrant.com> writes: > On 2014-01-21 21:42:19 -0500, Tom Lane wrote: >> Uh, what? The behavior I'm talking about is *exactly the same* >> as what happens now. The only change is that the data sent to the >> WAL file is laid out a bit differently, and the replay logic has >> to work harder to reassemble it before it can apply the commit or >> abort action. If anything outside replay can detect a difference >> at all, that would be a bug. >> >> Once again: the replayer is not supposed to act immediately on the >> subsidiary records. It's just supposed to remember their contents >> so it can reattach them to the eventual commit or abort record, >> and then do what it does today to replay the commit or abort.
> I (think) I get what you want to do, but splitting the record like that > nonetheless opens up behaviour that previously wasn't there. Obviously we are not on the same page yet. In my vision, the WAL writer is dumping the same data it would have dumped, though in a different layout, and it's working from process-local state same as it does now. The WAL replayer is taking the same actions at the same time using the same data as it does now. There is no "behavior that wasn't there", unless you're claiming that there are *existing* race conditions in commit/abort WAL processing. The only thing that seems mildly squishy about this is that it's not clear how long the WAL replayer ought to hang onto subsidiary records for a commit or abort it hasn't seen yet. In the case where we change our minds and abort a transaction after already having written some subsidiary records for the commit, it's not really a problem; the replayer can throw away any saved data related to the commit of xid N as soon as it sees an abort for xid N. However, what if the session crashes and never writes either a final commit or abort record? I think we can deal with this fairly easily though, because that case should end with a crash recovery cycle writing a shutdown checkpoint to the log (we do do that no?). So the rule can be "discard any unmatched subsidiary records if you see a shutdown checkpoint". This makes sense on its own terms since there are surely no active transactions at that point in the log. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers