Tom Lane wrote: > I observed a curious bug in autovac just now. Since plain vacuum avoids > calling GetTransactionSnapshot, an autovac worker that happens not to > analyze any tables will never call GetTransactionSnapshot at all. > This means it will arrive at vac_update_datfrozenxid with > RecentGlobalXmin never having been changed from its boot value of > FirstNormalTransactionId, which means that it will fail to update the > database's datfrozenxid ... or, if the current value of datfrozenxid > is past 2 billion, that it will improperly advance datfrozenxid to > sometime in the future.
Ouch :-( > I've only directly tested this in HEAD, but I suspect the problem goes > back a ways. Well, this logic was introduced in 8.2; I'm not sure if there's a problem in 8.1, but I don't think so. > On reflection I'm not even sure that this is strictly an autovacuum > bug. It can be cast more generically as "RecentGlobalXmin getting > used without ever having been set", and it sure looks to me like the > HOT patch may have introduced a few risks of that sort. Agreed. Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where it's going to be used that it's not that. > I'm thinking that maybe an appropriate fix is to insert a > GetTransactionSnapshot call at the beginning of InitPostgres' > transaction, thus ensuring that every backend has some vaguely sane > value for RecentGlobalXmin before it tries to do any database access. AFAIR there's an "initial transaction" in InitPostgres or something like that. Since it goes away quickly, it'd be a good place to ensure the snapshot does not last much longer. > Another thought is that even with that, an autovac worker is likely > to reach vac_update_datfrozenxid with a RecentGlobalXmin value that > was computed at the start of its run, and is thus rather old. > I wonder why vac_update_datfrozenxid is using the variable at all > rather than doing GetOldestXmin? It's not like that function is > so performance-critical that it needs to avoid calling GetOldestXmin. The function is called only once per autovacuum iteration, and once in manually-invoked vacuum, so certainly it's not performance-critical. > Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData, > is it actually necessary for lazy vacuum to avoid setting a snapshot? > It seems like it might be a good idea for it to do so in order to > keep its RecentGlobalXmin reasonably current. Hmm, I think I'd rather be inclined to get a snapshot just when it's going to finish. That way, RecentGlobalXmin will be up to date even if the > I've only looked at this in HEAD, but I am thinking that we have > a real problem here in both HEAD and 8.3. I'm less sure how bad > things are in the older branches. 8.2 does contain the vac_update_datfrozenxid problem at the very least. Older versions do not have that logic, so they are probably safe. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers