On Mon, May 20, 2019 at 3:10 AM Jason Harvey <ja...@reddit.com> wrote: > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to > 11. The upgrade itself went fine. About two days later, we unexpectedly hit > transaction ID wraparound. What was perplexing about this was that the age of > our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd > expect a wraparound. Curiously, the wraparound error referred to a mysterious > database of `OID 0`: > > UPDATE ERROR: database is not accepting commands to avoid wraparound data > loss in database with OID 0 > > We were able to recover after a few hours by greatly speeding up our vacuum > on our largest table. > > In a followup investigation I uncovered the reason we hit the wraparound so > early, and also the cause of the mysterious OID 0 message. When pg_upgrade > executes, it calls pg_resetwal to set the next transaction ID. Within > pg_resetwal is the following code: > https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450 > > This sets the controldata to have a fake database (OID 0) on the brink of > transaction wraparound. Specifically, after pg_upgrade is ran, wraparound > will occur within around 140 million transactions (provided the autovacuum > doesn't finish first). I confirmed by analyzing our controldata before and > after the upgrade that this was the cause of our early wraparound. > > Given the size and heavy volume of our database, we tend to complete a vacuum > in the time it takes around 250 million transactions to execute. With our > tunings this tends to be rather safe and we stay well away from the > wraparound point under normal circumstances.
This does seem like an unfriendly behavior. Moving the thread over to the -hackers list for further discussion... -- Peter Geoghegan