On Fri, Apr 24, 2015 at 3:04 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Heikki Linnakangas wrote: >> Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table >> and swapping relfilenodes afterwards. More like the VACUUM REWRITE >> that's been discussed. >> >> For the kicks, I looked at what it would take to write a utility like >> that. It turns out to be quite trivial, patch attached. It uses the same >> principle as VACUUM FULL, scans from the end, moving tuples to >> lower-numbered pages until it can't do it anymore. It requires a small >> change to heap_update(), to override the preference to store the new >> tuple on the same page as the old one, but other than that, it's all in >> the external module. > > More than five years have passed since Heikki posted this, and we still > haven't found a solution to the problem -- which neverthless keeps > biting people to the point that multiple "user-space" implementations of > similar techniques are out there.
Yeah. The problem with solving this with an update is that a concurrent "real" update may not see the expected behavior, especially at higher isolation levels. Tom also complained that the CTID will change, and somebody might care about that. But I think it's pretty clear that a lot of people will be able to live with those problems, and those who can't will be no worse off than now. > I think what we need here is something that does heap_update to tuples > at the end of the table, moving them to earlier pages; then wait for old > snapshots to die (the infrastructure for which we have now, thanks to > CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course, > there are lots of details to resolve. It doesn't really matter that > this runs for long: a process doing this for hours might be better than > AccessExclusiveLock on the table for a much shorter period. Why do you need to do anything other than update the tuples and let autovacuum clean up the mess? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers