On Mon, May 20, 2019 at 3:10 AM Jason Harvey <ja...@reddit.com> wrote:
> This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 
> 11. The upgrade itself went fine. About two days later, we unexpectedly hit 
> transaction ID wraparound. What was perplexing about this was that the age of 
> our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd 
> expect a wraparound. Curiously, the wraparound error referred to a mysterious 
> database of `OID 0`:
>
> UPDATE ERROR:  database is not accepting commands to avoid wraparound data 
> loss in database with OID 0
>
> We were able to recover after a few hours by greatly speeding up our vacuum 
> on our largest table.
>
> In a followup investigation I uncovered the reason we hit the wraparound so 
> early, and also the cause of the mysterious OID 0 message. When pg_upgrade 
> executes, it calls pg_resetwal to set the next transaction ID. Within 
> pg_resetwal is the following code: 
> https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450
>
> This sets the controldata to have a fake database (OID 0) on the brink of 
> transaction wraparound. Specifically, after pg_upgrade is ran, wraparound 
> will occur within around 140 million transactions (provided the autovacuum 
> doesn't finish first). I confirmed by analyzing our controldata before and 
> after the upgrade that this was the cause of our early wraparound.
>
> Given the size and heavy volume of our database, we tend to complete a vacuum 
> in the time it takes around 250 million transactions to execute. With our 
> tunings this tends to be rather safe and we stay well away from the 
> wraparound point under normal circumstances.

This does seem like an unfriendly behavior. Moving the thread over to
the -hackers list for further discussion...

-- 
Peter Geoghegan


Reply via email to