On 7 June 2018 at 22:25, Andres Freund <and...@anarazel.de> wrote: > On 2018-06-07 14:19:18 -0700, Andres Freund wrote:
> Look at: > > void > ProcArrayApplyRecoveryInfo(RunningTransactions running) > ... > /* > * Remove stale locks, if any. > * > * Locks are always assigned to the toplevel xid so we don't need to > care > * about subxcnt/subxids (and by extension not about ->suboverflowed). > */ > StandbyReleaseOldLocks(running->xcnt, running->xids); > > by excluding running transactions you have, as far as I can tell, > effectively removed the vacuum truncation AEL from the standby. I agree, that is correct, there is a bug in my recent commit that causes a small race window that could potentially lead to someone reading the size of a relation just before it is truncated and then falling off the end of the scan, resulting in a block access ERROR, potentially much later than the truncate. I have also found another bug which affects what we do next. For context, AEL locks are normally removed by COMMIT or ABORT. StandbyReleaseOldLocks() is just a sweeper to catch anything that didn't send an abort before it died, so it hardly ever activates. The coding of StandbyReleaseOldLocks() is backwards... if it ain't in the running list, then we remove it. But that wasn't working correctly either, since as of 49bff5300d527 we assigned AELs to subxids. Subxids weren't in the running list and so AELs held by them would have been removed at the wrong time, an extant bug in PG10. It looks to me like they would have been removed either early or late, up to the next runningxact info record. They would be removed, so no leakage, but the late timing wouldn't be noticeable for tests or most usage, since it would look just like lock contention. Early release might give same issue of block access to non-existent block/file. So the attached patch fixes both the bug in the recent commit and the one I just found by observation of 49bff5300d527, since they are related. StandbyReleaseOldLocks() can sweep in the same way as ExpireOldKnownAssignedTransactionIds(). > I also don't understand why this change would be backpatched in the > first place. It's a relatively minor efficiency thing, no? As for everything, that is open to discussion. Yes, it seems minor to me.... until it affects you, then its not. It seems to have affected Greg. The attached patch, or a later revision, needs to be backpatched to PG10 independently of the recent committed patch. I have yet to test this manually, but will do so tomorrow morning. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
remove_standby_subxid_locks.v1.patch
Description: Binary data