Re: [GENERAL] Massive delete from a live production DB
Dne 12.5.2011 16:23, Phoenix Kiula napsal(a): > Hi > > Been reading some old threads (pre 9.x version) and it seems that the > consensus is to avoid doing massive deletes from a table as it'll > create so much unrecoverable space/gaps that vacuum full would be > needed. Etc. > > Instead, we might as well do a dump/restore. Faster, cleaner. > > This is all well and good, but what about a situation where the > database is in production and cannot be brought down for this > operation or even a cluster? > > Any ideas on what I could do without losing all the live updates? I > need to get rid of about 11% of a 150 million rows of database, with > each row being nearly 1 to 5 KB in size... > > Thanks! Version is 9.0.4. One of the possible recipes in such case is usually a partitioning. If you can divide the data so that a delete is equal to a drop of a partition, then you don't need to worry about vacuum etc. But the partitioning has it's own problems - you can't reference the partitioned table using foreign keys, the query plans often are not as efficient as with a non-partitioned table etc. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
Dne 12.5.2011 17:37, Phoenix Kiula napsal(a): > [mydomain] src > cd pg_reorg-1.1.5 > > [mydomain] pg_reorg-1.1.5 > gmake > Makefile:13: ../../src/Makefile.global: No such file or directory > gmake: *** No rule to make target `../../src/Makefile.global'. Stop. > > [mydomain] pg_reorg-1.1.5 > > > > What am I missing? Do you have a source or just a binary package? To compile such contrib package you need a properly configured source tree. If you do have the sources already available, you need to run configure (because that's what produces the src/Makefile.global). And then build the contrib module again. If you don't have the sources, you can download the distribution at postgresql.org. But maybe there's a src package for your distro. > PS. If pg_reorg is such a useful contribution, why can't it be > included with PG? Seems like a very useful tool anyway! No? There's a lot of packages that might be included into the default install, but that'd put a lot of responsibilities to maintain them. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
On Thu, May 12, 2011 at 8:23 AM, Phoenix Kiula wrote: > Hi > > Been reading some old threads (pre 9.x version) and it seems that the > consensus is to avoid doing massive deletes from a table as it'll > create so much unrecoverable space/gaps that vacuum full would be > needed. Etc. > Any ideas on what I could do without losing all the live updates? I > need to get rid of about 11% of a 150 million rows of database, with > each row being nearly 1 to 5 KB in size... 11% is not big deal as the space will get re-used for future updates and inserts. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
[snip] > You can easily install it as a contrib . Just read the installation guide or > the man Page. Thanks Eric. How though? The instructions here -- http://reorg.projects.postgresql.org/pg_reorg.html -- are woefully incomplete. I have a standard PG install on WHM/Cpanel type server. I know the path to pgsql. I can download the pg_reorg.1.1.5.tar.gz into this folder and untar it. Then what? A "make" and "make install" does not work -- the usual "./config" stuff is not available. Sorry, I need more detailed steps. I googled and found this: http://www.postgresql.org/docs/9.0/static/contrib.html But the recommended steps: gmake gmake install ...don't work either. Here's what I see: [mydomain] src > cd pg_reorg-1.1.5 [mydomain] pg_reorg-1.1.5 > gmake Makefile:13: ../../src/Makefile.global: No such file or directory gmake: *** No rule to make target `../../src/Makefile.global'. Stop. [mydomain] pg_reorg-1.1.5 > What am I missing? PS. If pg_reorg is such a useful contribution, why can't it be included with PG? Seems like a very useful tool anyway! No? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
On Thu, May 12, 2011 at 17:23, Phoenix Kiula wrote: > Been reading some old threads (pre 9.x version) and it seems that the > consensus is to avoid doing massive deletes from a table as it'll > create so much unrecoverable space/gaps that vacuum full would be > needed. Etc. Just running DELETE with normal autovacuum won't *shrink* the physical table, but the freed-up space will be made available for future inserts/updates. No problem there. Fragmentation of newly inserted records is still a potential issue. It's true that pre-8.4 PostgreSQL versions you could run into "dead space" that couldn't be re-used, if you had badly tuned FSM. I presume this is why VACUUM FULL was recommended -- but this advice no longer applies to 8.4 or 9.0. > Instead, we might as well do a dump/restore. Faster, cleaner. > > Any ideas on what I could do without losing all the live updates? I > need to get rid of about 11% of a 150 million rows of database, with > each row being nearly 1 to 5 KB in size... For deleting 11%, a dump and restore of 150 million records and hundreds of gigabytes doesn't seem worth it. If it was closer to 50%, then I'd consider it. Regards, Marti -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
In response to Phoenix Kiula : > Hi > > Been reading some old threads (pre 9.x version) and it seems that the > consensus is to avoid doing massive deletes from a table as it'll > create so much unrecoverable space/gaps that vacuum full would be > needed. Etc. > > Instead, we might as well do a dump/restore. Faster, cleaner. > > This is all well and good, but what about a situation where the > database is in production and cannot be brought down for this > operation or even a cluster? > > Any ideas on what I could do without losing all the live updates? I > need to get rid of about 11% of a 150 million rows of database, with > each row being nearly 1 to 5 KB in size... Have you considered the following process: 1) SELECT the rows you want to keep into a new table (time-consuming) 2) Start outage 3) Pull over any new rows that might have been added between 1 & 2 4) Drop the old table 5) Rename the new table to the old name 6) Any other steps required to make the new table exactly like the old one (i.e. foreign keys, serials, etc) 7) End outage window Because steps 3 - 6 are very fast, your outage window is very short. Not a perfect, 0 downtime solution, but possibly helpful. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
Am 12.05.2011 16:38, schrieb Phoenix Kiula: On Thu, May 12, 2011 at 10:33 PM, Eric Ndengang wrote: Am 12.05.2011 16:23, schrieb Phoenix Kiula: Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Instead, we might as well do a dump/restore. Faster, cleaner. This is all well and good, but what about a situation where the database is in production and cannot be brought down for this operation or even a cluster? Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... Thanks! Version is 9.0.4. Hey, try to use pg_reorg --> http://reorg.projects.postgresql.org but the table must get a primary key. regards Thanks Eric. I do have a primary key. I am on version 9.0.4. Will pg_reorg work with this version too? The example on that website mentions 8.3. Also, it it a fast process that does not consume too much resource? This DB is behind a very high traffic website, so I cannot have a CLUSTER alternative like pg_reog making my DB very slow concurrently. How does one install the patch easily on CentOS (Linux) 64 bit? Thanks! Hi, /* I am on version 9.0.4. Will pg_reorg work with this version too? The example on that website mentions 8.3. */ I used to use pg_reorg on version 8.4.8 and regarding the documentation it will also work with the 9.0 version. /* How does one install the patch easily on CentOS (Linux) 64 bit? */ You can easily install it as a contrib . Just read the installation guide or the man Page. /* Also, it it a fast process that does not consume too much resource? This DB is behind a very high traffic website, so I cannot have a CLUSTER alternative like pg_reog making my DB very slow concurrently.*/ Yes, it's a fast process that is neither time nor resource consumming. The reorgainization of a table with about 60 million could take less than 8 minutes without higher cpu cost. cheers -- Eric Ndengang Datenbankadministrator Affinitas GmbH | Kohlfurter Straße 41/43 | 10999 Berlin | Germany email: eric.ndengang_fo...@affinitas.de | tel: +49.(0)30. 991 949 5 0 | www.edarling.de Geschäftsführer: Lukas Brosseder, David Khalil, Kai Rieke, Christian Vollmann Eingetragen beim Amtsgericht Berlin, HRB 115958 Real People: www.edarling.de/echte-paare Real Love:www.youtube.de/edarling Real Science: www.edarling.org -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Massive delete from a live production DB
Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Instead, we might as well do a dump/restore. Faster, cleaner. This is all well and good, but what about a situation where the database is in production and cannot be brought down for this operation or even a cluster? Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... Thanks! Version is 9.0.4. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general