[GENERAL] Massive delete from a live production DB

2011-05-12 Thread Phoenix Kiula
Hi

Been reading some old threads (pre 9.x version) and it seems that the
consensus is to avoid doing massive deletes from a table as it'll
create so much unrecoverable space/gaps that vacuum full would be
needed. Etc.

Instead, we might as well do a dump/restore. Faster, cleaner.

This is all well and good, but what about a situation where the
database is in production and cannot be brought down for this
operation or even a cluster?

Any ideas on what I could do without losing all the live updates? I
need to get rid of about 11% of a 150 million rows of database, with
each row being nearly 1 to 5 KB in size...

Thanks! Version is 9.0.4.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Eric Ndengang

Am 12.05.2011 16:38, schrieb Phoenix Kiula:

On Thu, May 12, 2011 at 10:33 PM, Eric Ndengang
eric.ndengang_fo...@affinitas.de  wrote:

Am 12.05.2011 16:23, schrieb Phoenix Kiula:

Hi

Been reading some old threads (pre 9.x version) and it seems that the
consensus is to avoid doing massive deletes from a table as it'll
create so much unrecoverable space/gaps that vacuum full would be
needed. Etc.

Instead, we might as well do a dump/restore. Faster, cleaner.

This is all well and good, but what about a situation where the
database is in production and cannot be brought down for this
operation or even a cluster?

Any ideas on what I could do without losing all the live updates? I
need to get rid of about 11% of a 150 million rows of database, with
each row being nearly 1 to 5 KB in size...

Thanks! Version is 9.0.4.


Hey,
try to use pg_reorg --  http://reorg.projects.postgresql.org
but the table must get a primary key.
regards




Thanks Eric.

I do have a primary key.

I am on version 9.0.4. Will pg_reorg work with this version too? The
example on that website  mentions 8.3.

Also, it it a fast process that does not consume too much resource?
This DB is behind a very high traffic website, so I cannot have a
CLUSTER alternative like pg_reog making my DB very slow concurrently.

How does one install the patch easily on CentOS (Linux) 64 bit?

Thanks!

Hi,

/* I am on version 9.0.4. Will pg_reorg work with this version too? The
example on that website  mentions 8.3. */

I used to use pg_reorg on version 8.4.8  and regarding the documentation 
it will also work with the 9.0 version.


/* How does one install the patch easily on CentOS (Linux) 64 bit? */

You can easily install it as a contrib . Just read the installation 
guide or the man Page.


/*

Also, it it a fast process that does not consume too much resource?
This DB is behind a very high traffic website, so I cannot have a
CLUSTER alternative like pg_reog making my DB very slow concurrently.*/

Yes, it's a fast process that is neither time nor resource consumming. The 
reorgainization of a table with about 60 million could take less than 8 minutes 
without higher cpu cost.

cheers

--
Eric Ndengang
Datenbankadministrator

Affinitas GmbH  |  Kohlfurter Straße 41/43  |  10999 Berlin  |  Germany
email: eric.ndengang_fo...@affinitas.de  | tel: +49.(0)30. 991 949 5 0  |  
www.edarling.de

Geschäftsführer: Lukas Brosseder, David Khalil, Kai Rieke, Christian Vollmann
Eingetragen beim Amtsgericht Berlin, HRB 115958

Real People:  www.edarling.de/echte-paare
Real Love:www.youtube.de/edarling
Real Science: www.edarling.org


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Bill Moran
In response to Phoenix Kiula phoenix.ki...@gmail.com:

 Hi
 
 Been reading some old threads (pre 9.x version) and it seems that the
 consensus is to avoid doing massive deletes from a table as it'll
 create so much unrecoverable space/gaps that vacuum full would be
 needed. Etc.
 
 Instead, we might as well do a dump/restore. Faster, cleaner.
 
 This is all well and good, but what about a situation where the
 database is in production and cannot be brought down for this
 operation or even a cluster?
 
 Any ideas on what I could do without losing all the live updates? I
 need to get rid of about 11% of a 150 million rows of database, with
 each row being nearly 1 to 5 KB in size...

Have you considered the following process:

1) SELECT the rows you want to keep into a new table (time-consuming)
2) Start outage
3) Pull over any new rows that might have been added between 1  2
4) Drop the old table
5) Rename the new table to the old name
6) Any other steps required to make the new table exactly like
   the old one (i.e. foreign keys, serials, etc)
7) End outage window

Because steps 3 - 6 are very fast, your outage window is very short.
Not a perfect, 0 downtime solution, but possibly helpful.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Marti Raudsepp
On Thu, May 12, 2011 at 17:23, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 Been reading some old threads (pre 9.x version) and it seems that the
 consensus is to avoid doing massive deletes from a table as it'll
 create so much unrecoverable space/gaps that vacuum full would be
 needed. Etc.

Just running DELETE with normal autovacuum won't *shrink* the physical
table, but the freed-up space will be made available for future
inserts/updates. No problem there. Fragmentation of newly inserted
records is still a potential issue.

It's true that pre-8.4 PostgreSQL versions you could run into dead
space that couldn't be re-used, if you had badly tuned FSM. I presume
this is why VACUUM FULL was recommended -- but this advice no longer
applies to 8.4 or 9.0.

 Instead, we might as well do a dump/restore. Faster, cleaner.

 Any ideas on what I could do without losing all the live updates? I
 need to get rid of about 11% of a 150 million rows of database, with
 each row being nearly 1 to 5 KB in size...

For deleting 11%, a dump and restore of 150 million records and
hundreds of gigabytes doesn't seem worth it. If it was closer to 50%,
then I'd consider it.

Regards,
Marti

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Phoenix Kiula
[snip]


 You can easily install it as a contrib . Just read the installation guide or
 the man Page.



Thanks Eric. How though?

The instructions here --
http://reorg.projects.postgresql.org/pg_reorg.html  -- are woefully
incomplete.

I have a standard PG install on WHM/Cpanel type server. I know the
path to pgsql. I can download the pg_reorg.1.1.5.tar.gz into this
folder and untar it. Then what? A make and make install does not
work -- the usual ./config stuff is not available.

Sorry, I need more detailed steps.

I googled and found this:
http://www.postgresql.org/docs/9.0/static/contrib.html

But the recommended steps:

   gmake
   gmake install

...don't work either. Here's what I see:



[mydomain] src  cd pg_reorg-1.1.5

[mydomain] pg_reorg-1.1.5  gmake
Makefile:13: ../../src/Makefile.global: No such file or directory
gmake: *** No rule to make target `../../src/Makefile.global'.  Stop.

[mydomain] pg_reorg-1.1.5 



What am I missing?

PS. If pg_reorg is such a useful contribution, why can't it be
included with PG? Seems like a very useful tool anyway! No?

Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Scott Marlowe
On Thu, May 12, 2011 at 8:23 AM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 Hi

 Been reading some old threads (pre 9.x version) and it seems that the
 consensus is to avoid doing massive deletes from a table as it'll
 create so much unrecoverable space/gaps that vacuum full would be
 needed. Etc.

 Any ideas on what I could do without losing all the live updates? I
 need to get rid of about 11% of a 150 million rows of database, with
 each row being nearly 1 to 5 KB in size...

11% is not big deal as the space will get re-used for future updates
and inserts.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Tomas Vondra
Dne 12.5.2011 17:37, Phoenix Kiula napsal(a):
 [mydomain] src  cd pg_reorg-1.1.5
 
 [mydomain] pg_reorg-1.1.5  gmake
 Makefile:13: ../../src/Makefile.global: No such file or directory
 gmake: *** No rule to make target `../../src/Makefile.global'.  Stop.
 
 [mydomain] pg_reorg-1.1.5 
 
 
 What am I missing?

Do you have a source or just a binary package? To compile such contrib
package you need a properly configured source tree.

If you do have the sources already available, you need to run configure
(because that's what produces the src/Makefile.global). And then build
the contrib module again.

If you don't have the sources, you can download the distribution at
postgresql.org. But maybe there's a src package for your distro.

 PS. If pg_reorg is such a useful contribution, why can't it be
 included with PG? Seems like a very useful tool anyway! No?

There's a lot of packages that might be included into the default
install, but that'd put a lot of responsibilities to maintain them.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Massive delete from a live production DB

2011-05-12 Thread Tomas Vondra
Dne 12.5.2011 16:23, Phoenix Kiula napsal(a):
 Hi
 
 Been reading some old threads (pre 9.x version) and it seems that the
 consensus is to avoid doing massive deletes from a table as it'll
 create so much unrecoverable space/gaps that vacuum full would be
 needed. Etc.
 
 Instead, we might as well do a dump/restore. Faster, cleaner.
 
 This is all well and good, but what about a situation where the
 database is in production and cannot be brought down for this
 operation or even a cluster?
 
 Any ideas on what I could do without losing all the live updates? I
 need to get rid of about 11% of a 150 million rows of database, with
 each row being nearly 1 to 5 KB in size...
 
 Thanks! Version is 9.0.4.

One of the possible recipes in such case is usually a partitioning. If
you can divide the data so that a delete is equal to a drop of a
partition, then you don't need to worry about vacuum etc.

But the partitioning has it's own problems - you can't reference the
partitioned table using foreign keys, the query plans often are not as
efficient as with a non-partitioned table etc.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general