On 2013-11-20 12:48:50 +0200, Heikki Linnakangas wrote: > On 19.11.2013 16:22, Andres Freund wrote: > >On 2013-11-19 15:20:01 +0100, Andres Freund wrote: > >>Imo something the attached patch should be done. The description I came > >>up with is: > >> > >> Fix Hot-Standby initialization of clog and subtrans. > > Looks ok for a back-patchable fix. > > It's a bit bizarre that the ExtendSUBTRANS loop in > ProcArrayApplyRecoveryInfo looks different from the one in > RecordKnownAssignedTransactionIds, but both look correct to me.
That's because of the different upper bounds (nextxid) vs xid]), but yea, I wondered about that as well. > In master, it'd be nice to do some further cleanup. Some gripes: > > In ProcArrayApplyXidAssignment, I wonder if it would be best to just remove > the (standbyState == STANDBY_INITIALIZED) check altogether. The > KnownAssignedXidsRemoveTree() that follows is harmless if there is nothing > in the known-assigned-xids array (xact_redo_commit does it in > STANDBY_INITIALIZED state too). The other thing that's done after that check > is updating lastOverflowedXid, and AFAICS it would be harmless to update > that, too. It will be overwritten by the ProcArrayApplyRecoveryInfo() call > that comes later. I was thinking about removing it entirely in the patch, but chose not to do so. I don't really care which way we go. > Clog, subtrans and multixact are all handled differently. Extensions of clog > and multixact are WAL-logged, but extensions of subtrans are not. They all > have a Startup function, but it has a slightly different purpose. > StartupCLOG only sets latest_page_number, but StartupSUBTRANS and > StartupMultiXact zero out the current page. For CLOG, the TrimCLOG() > function does that. Why is clog handled differently from multixact? I'd guess clog and multixact are handled differently because multixact supposedly is never queried during recovery. But I don't that's actually still true, thinking of 9.3's changes around fkey locks and HeapTupleGetUpdateXid(). So it's probably time to split StartupMultiXact similar to clog's routines. > StartupCLOG() and StartupMultiXact set latest_page_number, but > StartupSUBTRANS does not. Is that a problem for subtrans? I don't think it is, the difference is that StartupSUBTRANS() zeroes out the current subtrans page which will set latest_page_number, the other's access the pages normally, which doesn't set it. > StartupCLOG() and > StartupMultiXact() are called at different stages in hot standby - > StartupCLOG() is called at the beginning of recovery, but StartupMultiXact() > is only called at end of recovery. Why the discrepancy, does > latest_page_number need to be set during hot standby or not? latest_page_number primarily is an optimization, isn't it? Except for a safeguard check in SimpleLruTruncate() it should only influence victim buffer initialization. But: slru.c explicitly doesn't initialize ->latest_page_number, which means it's zeroed from a memset slightly above. Which seems we might cause problems when performing truncations during recovery, before the first page is zeroed (which'd set latest_page_number again). ... Hm. Do we actually *ever* truncate the multixact slru during recovery? clog.c's truncations are WAL logged, TruncateSUBTRANS() is performed by restartpoints, but there's no callers to TruncateMultiXact but vac_truncate_clog and it's not logged? That doesn't seem right. > I think we should bite the bullet, and WAL-log the extension of subtrans, > too. Then make the startup and extension procedure for all the SLRUs the > same. Hm. I don't really see a big advantage in that? I am all for trying to bring more symetry to the startup routines, but I don't think forcing WAL logging for something scrapped every restart is necessary for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers