Re: [ADMIN] Bloated pg_shdepend_depender_index
TODO already has: * Improve speed with indexes For large table adjustments during VACUUM FULL, it is faster to reindex rather than update the index. --- Tom Lane wrote: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > Am Freitag, 24. M?rz 2006 05:48 schrieb Tom Lane: > >> Well, the VACUUM FULL algorithm is incapable of shrinking indexes --- > >> the only way is REINDEX, or something else that reconstructs indexes > >> from scratch, such as CLUSTER. One of the things we need to look into > >> is putting more smarts into VACUUM so that it automatically does > >> something reasonable when faced with extreme cases like these. > > > If the user is running VACUUM FULL, he has presumably determined that the > > table is too bloated to be recovered in a graceful way, and quite likely > > the > > indexes are going to be bloated similarly. So seemingly one might as well > > launch a reindexing on the table after VACUUM FULL has done its thing. > > Whether that should be automatic is another question but perhaps the advice > > should be documented somewhere? > > Actually, I wonder whether VACUUM FULL shouldn't be thrown away and > replaced by something else entirely. That algorithm only really works > nicely when just a small percentage of the rows need to be moved to > re-compact the table --- if you're moving lots of rows, it makes the > index bloat situation *worse* not better because of the transient need > for index entries pointing to both copies of moved rows. Lazy VACUUM > has become the de-facto standard for situations where there's not a huge > amount of empty space, and so it's not clear where the sweet spot is for > VACUUM FULL anymore. If you've got enough disk space, a rewrite (like > CLUSTER or ALTER TABLE) is going to blow the doors off VACUUM FULL, > let alone VACUUM FULL plus REINDEX. Not to mention that for > sufficiently huge tables, VACUUM FULL fails outright because it runs out > of RAM. > > We need to fix CLUSTER to make it MVCC-safe (ie, not discard > recently-dead rows), and it'd be nice to have something like it that > didn't worry about ordering but just did a seqscan of the source table. > Then I'd be inclined to recommend that instead of VACUUM FULL for most > cases of severe bloat. > > Unfortunately this all breaks down for shared system catalogs and the > core (nailed-in) catalogs, because we can't change their relfilenodes > and so the crash-safe CLUSTER/REINDEX approach doesn't work. We still > need a new idea or two there. > > regards, tom lane > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [ADMIN] Bloated pg_shdepend_depender_index
adey wrote: > > 2) Can vacuum full not be redesigned to run online without locking tables > > and users, This is already done. See non-FULL VACUUM. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [ADMIN] Bloated pg_shdepend_depender_index
Please could someone help me with my questions below? On 3/25/06, adey <[EMAIL PROTECTED]> wrote: Two questions in this regard please? 1) Is tuple theory not the root of this problem 2) Vacuum does much the same as a traditional database reorg, and online reorgs are a reality now 1) If I understand tuple theory correctly, copies of rows are created through normal Postgres processing, that expire after a period, leaving multiple copies of redundant data to be vacuumed dead, then vacuumed out (or both). Most databases have been built using one copy of a row with sophisticated locking control mechanism that Postgres has some of anyway, and the industry has developed methods and designs to exploit locking to best advantage. Even with tuples, locking is still evident in Postgres. OR 2) Can vacuum full not be redesigned to run online without locking tables and users, like a conventional online reorg, eg: work on 1 data page at a time instead of locking the whole table with a shorter period at the end to lock the table and "compress" the remaining populated data pages and release disk space back to the OS; or one data file at a time, and have vacuum full per table reduce / tidy up the wraparound value, thereby avoiding a full DB vacuum for longer periods. In this way vacuum can be performed regularly and be less intrusive. Nowadays 24x7 is more of a reality for systems and we can't afford to take systems down for many hours to perform regular maintenance. (It would be extremely helpful to DBA's with little OS experience or access to have more automation in PGAdmin, especially task scheduling and alerting, so SQL can be scheduled in PGAmin instead of crontab, which is usually a sysadmin function). On 3/25/06, Tom Lane <[EMAIL PROTECTED]> wrote: Peter Eisentraut < [EMAIL PROTECTED]> writes:> Am Freitag, 24. März 2006 05:48 schrieb Tom Lane: >> Well, the VACUUM FULL algorithm is incapable of shrinking indexes --->> the only way is REINDEX, or something else that reconstructs indexes >> from scratch, such as CLUSTER. One of the things we need to look into >> is putting more smarts into VACUUM so that it automatically does>> something reasonable when faced with extreme cases like these. > If the user is running VACUUM FULL, he has presumably determined that the > table is too bloated to be recovered in a graceful way, and quite likely the> indexes are going to be bloated similarly. So seemingly one might as well > launch a reindexing on the table after VACUUM FULL has done its thing. > Whether that should be automatic is another question but perhaps the advice> should be documented somewhere?Actually, I wonder whether VACUUM FULL shouldn't be thrown away and replaced by something else entirely. That algorithm only really works nicely when just a small percentage of the rows need to be moved tore-compact the table --- if you're moving lots of rows, it makes the index bloat situation *worse* not better because of the transient needfor index entries pointing to both copies of moved rows. Lazy VACUUMhas become the de-facto standard for situations where there's not a huge amount of empty space, and so it's not clear where the sweet spot is for VACUUM FULL anymore. If you've got enough disk space, a rewrite (likeCLUSTER or ALTER TABLE) is going to blow the doors off VACUUM FULL, let alone VACUUM FULL plus REINDEX. Not to mention that forsufficiently huge tables, VACUUM FULL fails outright because it runs out of RAM.We need to fix CLUSTER to make it MVCC-safe (ie, not discardrecently-dead rows), and it'd be nice to have something like it that didn't worry about ordering but just did a seqscan of the source table. Then I'd be inclined to recommend that instead of VACUUM FULL for mostcases of severe bloat.Unfortunately this all breaks down for shared system catalogs and the core (nailed-in) catalogs, because we can't change their relfilenodes and so the crash-safe CLUSTER/REINDEX approach doesn't work. We stillneed a new idea or two there. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Wed, Mar 29, 2006 at 07:19:10PM +0200, Rafael Martinez wrote: > On Tue, 2006-03-28 at 17:17 -0600, Jim C. Nasby wrote: > > On Wed, Mar 29, 2006 at 01:05:59AM +0200, Rafael Martinez wrote: > > > I work with postgresql every day and I am very happy with it, but this > > > does not mean I can not see the issues that could be improve to have a > > > even better open source DBMS. And I think in my humble opinion that > > > bloated indexes + better upgrade procedures between major releases are > > > two things that should get improved in the future. > > > > FWIW, I don't know of any upgrade procedures for databases that can > > quickly do an in-place upgrade when underlying file structures change, > > because ultimately you need to read and write the entire database > > on-disk. > > I know there is not an easy solution to the dump/restore procedure, and > maybe even it is not possible (I am not a postgres developer and don't > know the postgres internals and what it is necessary to do between major > releases) Does the file structures change always between every major > release? It depends. Sometimes the changes aren't as much in the files as they are in the system catalogs. Ideally, what we'd have is the ability to deal with data that was stored in the last versions format. Any time an old row gets changed, it gets re-written in the new format (probably on a different page). While this would present some challenges, it would make for very, very fast upgrades. Unfortunately, it would also greatly increase code complexity and maintenance costs, so it's rather unlikely it will ever happen. Maybe if someone forks over a very large sum of money, but even then it's unlikely... An actual upgrade script is more likely, but even there you still need to have a backup (actually, that's really pretty true of both cases). This idea does have some traction though, and if someone produced a working utility there's a decent chance it would be accepted. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Tue, 2006-03-28 at 17:17 -0600, Jim C. Nasby wrote: > On Wed, Mar 29, 2006 at 01:05:59AM +0200, Rafael Martinez wrote: > > I work with postgresql every day and I am very happy with it, but this > > does not mean I can not see the issues that could be improve to have a > > even better open source DBMS. And I think in my humble opinion that > > bloated indexes + better upgrade procedures between major releases are > > two things that should get improved in the future. > > FWIW, I don't know of any upgrade procedures for databases that can > quickly do an in-place upgrade when underlying file structures change, > because ultimately you need to read and write the entire database > on-disk. I know there is not an easy solution to the dump/restore procedure, and maybe even it is not possible (I am not a postgres developer and don't know the postgres internals and what it is necessary to do between major releases) Does the file structures change always between every major release? Today I asked some colleagues in the oracle department and in the past they had to do this dump/restore procedure between major releases, but not anymore. Now they start the database in a special mode (singel?) after upgrading the software and run some scripts that modify what it needs to be modify without having to dump/restore all the data. The time needed to run this process changes between versions, sometimes goes very fast, other times takes more time but they say that this process is much faster than the old dump/restore one. regards, -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Wed, 2006-03-29 at 13:16 +0100, Andy Shellam wrote: > This is true, but you could run the new version on the same server/different > port to the current version, and do a simultaneous dump/restore without > having to use extra disk space, or taking your production database offline - > then you can schedule a time to kill the old one, re-assign the port on the > new one and bring it up - would take about 30 seconds, and this could be > done at a time when traffic was at its lowest. Running the new version on another port in the samme server does not help much. You have to initialize the new version to use new data partitions (so you need the double of the disk space you have in production. Another minus is that this works only in a 'read only' database, all the updates from you start the dump/restore process until you kill/re-assign/start again won't be in the new version (and this can be a lot of data in a busy/large system) -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [ADMIN] Bloated pg_shdepend_depender_index
Andy Shellam wrote: This is true, but you could run the new version on the same server/different port to the current version, and do a simultaneous dump/restore without having to use extra disk space, or taking your production database offline - then you can schedule a time to kill the old one, re-assign the port on the new one and bring it up - would take about 30 seconds, and this could be done at a time when traffic was at its lowest. Hmmm - Interesting solution! Does you know of anybody that have tried an upgrade migration like this, and experience with very large databases? -- - Ola Sandbu tel: (+47)22840197 USIT/SAPP-DBAmob: (+47)91594704 mailto:[EMAIL PROTECTED] - ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [ADMIN] Bloated pg_shdepend_depender_index
This is true, but you could run the new version on the same server/different port to the current version, and do a simultaneous dump/restore without having to use extra disk space, or taking your production database offline - then you can schedule a time to kill the old one, re-assign the port on the new one and bring it up - would take about 30 seconds, and this could be done at a time when traffic was at its lowest. QUOTE: Uh, for many sites minutes of downtime would be a huge issue, let alone hours. Fortunately, you can use Slony to migrate between versions, greatly reducing downtime. END QUOTE ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Wed, Mar 29, 2006 at 01:05:59AM +0200, Rafael Martinez wrote: > I work with postgresql every day and I am very happy with it, but this > does not mean I can not see the issues that could be improve to have a > even better open source DBMS. And I think in my humble opinion that > bloated indexes + better upgrade procedures between major releases are > two things that should get improved in the future. FWIW, I don't know of any upgrade procedures for databases that can quickly do an in-place upgrade when underlying file structures change, because ultimately you need to read and write the entire database on-disk. And unless you're going to be real gutsy, you'll also need a good amount of extra storage somewhere, unless you plan on doing this in-place upgrade without any backups. And at that point, you almost might as well just do a dump and reload. Given that, I don't think setting up a temporary Slony cluster to go between major versions is that big a deal. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Tue, 2006-03-28 at 15:15 -0600, Scott Marlowe wrote: > On Tue, 2006-03-28 at 13:51, Rafael Martinez wrote: > > On Tue, 2006-03-28 at 12:09 -0600, Jim C. Nasby wrote: > > > > > > > > Fortunately, you can use Slony to migrate between versions, greatly > > > reducing downtime. > > > > Yes, this is true, but the problem is that if the database is 'big' and > > running in an expensive system you need to double your investment only > > for the upgrade. > > > > I am sure we can find another way of doing this that is easier and > > faster. > > Any large, production system should already have replication in place. > > If not, you're taking your chances with that big system that make any > investment questionable. If downtime is a real problem. You do not need replication in place to run a large system in a responsible way. I agree that you need redundance in many layers but you have to stop at some point. We have for example, some big/expensive disk arrays in our system that have a lot of redundancy, redundant hot plug power supplies, redundant hot plug blowers, redundant hot plug cache batteries, redundant hot plug FC I/O modules, redundant pair of controllers, etc, etc and an expensive 24x7 onsite hardware support. But we do not have an extra disk array just in case the one in production stops working (it can happen but we have to trust it won't) Fast scsi disks in this type of array are not cheap, and the bosses are not happy if you say that you need the double of space to upgrade the system, this can be very expensive in a big system. I work with postgresql every day and I am very happy with it, but this does not mean I can not see the issues that could be improve to have a even better open source DBMS. And I think in my humble opinion that bloated indexes + better upgrade procedures between major releases are two things that should get improved in the future. regards, -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Tue, 2006-03-28 at 13:51, Rafael Martinez wrote: > On Tue, 2006-03-28 at 12:09 -0600, Jim C. Nasby wrote: > > > > > Fortunately, you can use Slony to migrate between versions, greatly > > reducing downtime. > > Yes, this is true, but the problem is that if the database is 'big' and > running in an expensive system you need to double your investment only > for the upgrade. > > I am sure we can find another way of doing this that is easier and > faster. Any large, production system should already have replication in place. If not, you're taking your chances with that big system that make any investment questionable. If downtime is a real problem. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Tue, 2006-03-28 at 12:09 -0600, Jim C. Nasby wrote: > > Fortunately, you can use Slony to migrate between versions, greatly > reducing downtime. Yes, this is true, but the problem is that if the database is 'big' and running in an expensive system you need to double your investment only for the upgrade. I am sure we can find another way of doing this that is easier and faster. -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Tue, Mar 28, 2006 at 03:54:30PM +0100, Andy Shellam wrote: > QUOTE: > Exactly this issue and that you have to make a 'full' dump/restore > between major release is a big minus I hear everywhere I > explain/discuss about postgres for 24/7 and big databases. > END QUOTE > > Yes but how often does a major release come out? 17th Jan 2005 was the > first date of 8.0 according to the FTP site, with 8.1 out in November 2005. > If it takes a whole day to do a dump/restore, that's only one day out of > about 270 or more. Uh, for many sites minutes of downtime would be a huge issue, let alone hours. Fortunately, you can use Slony to migrate between versions, greatly reducing downtime. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [ADMIN] Bloated pg_shdepend_depender_index
QUOTE: Exactly this issue and that you have to make a 'full' dump/restore between major release is a big minus I hear everywhere I explain/discuss about postgres for 24/7 and big databases. END QUOTE Yes but how often does a major release come out? 17th Jan 2005 was the first date of 8.0 according to the FTP site, with 8.1 out in November 2005. If it takes a whole day to do a dump/restore, that's only one day out of about 270 or more. Plus in MS SQL Server, once you've moved on a version (e.g. 7 to 2000) you can't go back - whereas in PGSQL you can just re-dump your data and revert back to the previous version providing you're not using any features the previous version doesn't understand - it gives much more flexibility and more confidence in the upgrade. Andy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rafael Martinez Guerrero Sent: Tuesday, 28 March, 2006 1:09 PM To: Jim C. Nasby Cc: Tom Lane; Peter Eisentraut; pgsql-admin@postgresql.org; Gregory Maxwell Subject: Re: [ADMIN] Bloated pg_shdepend_depender_index On Fri, 2006-03-24 at 17:43, Jim C. Nasby wrote: > > Therein lies part of the problem: enough disk space. Now that we're > seeing more and more use of PostgreSQL in data warehousing, it's > becomming less safe to assume you'll have enough disk space to fix bloat > on large tables. Plus I suspect a lot of folks wouldn't be able to > tolerate being locked out of a table for that long (of course that > applies to VACUUM FULL as well...) > Hello Exactly this issue and that you have to make a 'full' dump/restore between major release is a big minus I hear everywhere I explain/discuss about postgres for 24/7 and big databases. It would be wonderful to see a solution to these two 'problems' in the future so postgres becomes an even better product than it is now. -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster !DSPAM:14,4429279535041476526676! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Fri, 2006-03-24 at 17:43, Jim C. Nasby wrote: > > Therein lies part of the problem: enough disk space. Now that we're > seeing more and more use of PostgreSQL in data warehousing, it's > becomming less safe to assume you'll have enough disk space to fix bloat > on large tables. Plus I suspect a lot of folks wouldn't be able to > tolerate being locked out of a table for that long (of course that > applies to VACUUM FULL as well...) > Hello Exactly this issue and that you have to make a 'full' dump/restore between major release is a big minus I hear everywhere I explain/discuss about postgres for 24/7 and big databases. It would be wonderful to see a solution to these two 'problems' in the future so postgres becomes an even better product than it is now. -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
In vacuum full output, indexes are listed as having been vacuumed along with their table. Have I misinterpreted this message saying that vacuum is incapable of vacuuming indexes (and reindex is the only option to do so) please? On 3/25/06, Tom Lane <[EMAIL PROTECTED]> wrote: Peter Eisentraut <[EMAIL PROTECTED]> writes:> Am Freitag, 24. März 2006 05:48 schrieb Tom Lane: >> Well, the VACUUM FULL algorithm is incapable of shrinking indexes --->> the only way is REINDEX, or something else that reconstructs indexes>> from scratch, such as CLUSTER. One of the things we need to look into >> is putting more smarts into VACUUM so that it automatically does>> something reasonable when faced with extreme cases like these.> If the user is running VACUUM FULL, he has presumably determined that the > table is too bloated to be recovered in a graceful way, and quite likely the> indexes are going to be bloated similarly. So seemingly one might as well> launch a reindexing on the table after VACUUM FULL has done its thing. > Whether that should be automatic is another question but perhaps the advice> should be documented somewhere?Actually, I wonder whether VACUUM FULL shouldn't be thrown away andreplaced by something else entirely. That algorithm only really works nicely when just a small percentage of the rows need to be moved tore-compact the table --- if you're moving lots of rows, it makes theindex bloat situation *worse* not better because of the transient need for index entries pointing to both copies of moved rows. Lazy VACUUMhas become the de-facto standard for situations where there's not a hugeamount of empty space, and so it's not clear where the sweet spot is for VACUUM FULL anymore. If you've got enough disk space, a rewrite (likeCLUSTER or ALTER TABLE) is going to blow the doors off VACUUM FULL,let alone VACUUM FULL plus REINDEX. Not to mention that forsufficiently huge tables, VACUUM FULL fails outright because it runs out of RAM.We need to fix CLUSTER to make it MVCC-safe (ie, not discardrecently-dead rows), and it'd be nice to have something like it thatdidn't worry about ordering but just did a seqscan of the source table. Then I'd be inclined to recommend that instead of VACUUM FULL for mostcases of severe bloat.Unfortunately this all breaks down for shared system catalogs and thecore (nailed-in) catalogs, because we can't change their relfilenodes and so the crash-safe CLUSTER/REINDEX approach doesn't work. We stillneed a new idea or two there. regards, tom lane---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
Two questions in this regard please? 1) Is tuple theory not the root of this problem 2) Vacuum does much the same as a traditional database reorg, and online reorgs are a reality now 1) If I understand tuple theory correctly, copies of rows are created through normal Postgres processing, that expire after a period, leaving multiple copies of redundant data to be vacuumed dead, then vacuumed out (or both). Most databases have been built using one copy of a row with sophisticated locking control mechanism that Postgres has some of anyway, and the industry has developed methods and designs to exploit locking to best advantage. Even with tuples, locking is still evident in Postgres. OR 2) Can vacuum full not be redesigned to run online without locking tables and users, like a conventional online reorg, eg: work on 1 data page at a time instead of locking the whole table with a shorter period at the end to lock the table and "compress" the remaining populated data pages and release disk space back to the OS; or one data file at a time, and have vacuum full per table reduce / tidy up the wraparound value, thereby avoiding a full DB vacuum for longer periods. In this way vacuum can be performed regularly and be less intrusive. Nowadays 24x7 is more of a reality for systems and we can't afford to take systems down for many hours to perform regular maintenance. (It would be extremely helpful to DBA's with little OS experience or access to have more automation in PGAdmin, especially task scheduling and alerting, so SQL can be scheduled in PGAmin instead of crontab, which is usually a sysadmin function). On 3/25/06, Tom Lane <[EMAIL PROTECTED]> wrote: Peter Eisentraut <[EMAIL PROTECTED]> writes:> Am Freitag, 24. März 2006 05:48 schrieb Tom Lane: >> Well, the VACUUM FULL algorithm is incapable of shrinking indexes --->> the only way is REINDEX, or something else that reconstructs indexes>> from scratch, such as CLUSTER. One of the things we need to look into >> is putting more smarts into VACUUM so that it automatically does>> something reasonable when faced with extreme cases like these.> If the user is running VACUUM FULL, he has presumably determined that the > table is too bloated to be recovered in a graceful way, and quite likely the> indexes are going to be bloated similarly. So seemingly one might as well> launch a reindexing on the table after VACUUM FULL has done its thing. > Whether that should be automatic is another question but perhaps the advice> should be documented somewhere?Actually, I wonder whether VACUUM FULL shouldn't be thrown away andreplaced by something else entirely. That algorithm only really works nicely when just a small percentage of the rows need to be moved tore-compact the table --- if you're moving lots of rows, it makes theindex bloat situation *worse* not better because of the transient need for index entries pointing to both copies of moved rows. Lazy VACUUMhas become the de-facto standard for situations where there's not a hugeamount of empty space, and so it's not clear where the sweet spot is for VACUUM FULL anymore. If you've got enough disk space, a rewrite (likeCLUSTER or ALTER TABLE) is going to blow the doors off VACUUM FULL,let alone VACUUM FULL plus REINDEX. Not to mention that forsufficiently huge tables, VACUUM FULL fails outright because it runs out of RAM.We need to fix CLUSTER to make it MVCC-safe (ie, not discardrecently-dead rows), and it'd be nice to have something like it thatdidn't worry about ordering but just did a seqscan of the source table. Then I'd be inclined to recommend that instead of VACUUM FULL for mostcases of severe bloat.Unfortunately this all breaks down for shared system catalogs and thecore (nailed-in) catalogs, because we can't change their relfilenodes and so the crash-safe CLUSTER/REINDEX approach doesn't work. We stillneed a new idea or two there. regards, tom lane---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
On Fri, Mar 24, 2006 at 10:02:01AM -0500, Tom Lane wrote: > Actually, I wonder whether VACUUM FULL shouldn't be thrown away and > replaced by something else entirely. That algorithm only really works > nicely when just a small percentage of the rows need to be moved to > re-compact the table --- if you're moving lots of rows, it makes the > index bloat situation *worse* not better because of the transient need > for index entries pointing to both copies of moved rows. Lazy VACUUM > has become the de-facto standard for situations where there's not a huge > amount of empty space, and so it's not clear where the sweet spot is for > VACUUM FULL anymore. If you've got enough disk space, a rewrite (like Therein lies part of the problem: enough disk space. Now that we're seeing more and more use of PostgreSQL in data warehousing, it's becomming less safe to assume you'll have enough disk space to fix bloat on large tables. Plus I suspect a lot of folks wouldn't be able to tolerate being locked out of a table for that long (of course that applies to VACUUM FULL as well...) There's a sorta-kinda solution available for the heap, involving repeated cycles of vacuum and then update all the tuples off the last page, and hopefully there will be some better possibilities in 8.2. But that still leaves indexes. Are there any improvements that can be made in that regard? I know it's a lot harder to move index tuples around, but surely it's not impossible (I'd hope). Or as an alternative, you could 'move' index tuples by updating tuples in the heap and having some means to direct what index pages the new entries should favor. If there was some relatively easy means of compacting tables and indexes that could operate in the background (ie: doesn't need any table-level locks) I suspect most of the need for things like VACUUM FULL, REINDEX, and perhaps even CLUSTER would go away. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [ADMIN] Bloated pg_shdepend_depender_index
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Am Freitag, 24. März 2006 05:48 schrieb Tom Lane: >> Well, the VACUUM FULL algorithm is incapable of shrinking indexes --- >> the only way is REINDEX, or something else that reconstructs indexes >> from scratch, such as CLUSTER. One of the things we need to look into >> is putting more smarts into VACUUM so that it automatically does >> something reasonable when faced with extreme cases like these. > If the user is running VACUUM FULL, he has presumably determined that the > table is too bloated to be recovered in a graceful way, and quite likely the > indexes are going to be bloated similarly. So seemingly one might as well > launch a reindexing on the table after VACUUM FULL has done its thing. > Whether that should be automatic is another question but perhaps the advice > should be documented somewhere? Actually, I wonder whether VACUUM FULL shouldn't be thrown away and replaced by something else entirely. That algorithm only really works nicely when just a small percentage of the rows need to be moved to re-compact the table --- if you're moving lots of rows, it makes the index bloat situation *worse* not better because of the transient need for index entries pointing to both copies of moved rows. Lazy VACUUM has become the de-facto standard for situations where there's not a huge amount of empty space, and so it's not clear where the sweet spot is for VACUUM FULL anymore. If you've got enough disk space, a rewrite (like CLUSTER or ALTER TABLE) is going to blow the doors off VACUUM FULL, let alone VACUUM FULL plus REINDEX. Not to mention that for sufficiently huge tables, VACUUM FULL fails outright because it runs out of RAM. We need to fix CLUSTER to make it MVCC-safe (ie, not discard recently-dead rows), and it'd be nice to have something like it that didn't worry about ordering but just did a seqscan of the source table. Then I'd be inclined to recommend that instead of VACUUM FULL for most cases of severe bloat. Unfortunately this all breaks down for shared system catalogs and the core (nailed-in) catalogs, because we can't change their relfilenodes and so the crash-safe CLUSTER/REINDEX approach doesn't work. We still need a new idea or two there. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [ADMIN] Bloated pg_shdepend_depender_index
Am Freitag, 24. März 2006 05:48 schrieb Tom Lane: > Well, the VACUUM FULL algorithm is incapable of shrinking indexes --- > the only way is REINDEX, or something else that reconstructs indexes > from scratch, such as CLUSTER. One of the things we need to look into > is putting more smarts into VACUUM so that it automatically does > something reasonable when faced with extreme cases like these. If the user is running VACUUM FULL, he has presumably determined that the table is too bloated to be recovered in a graceful way, and quite likely the indexes are going to be bloated similarly. So seemingly one might as well launch a reindexing on the table after VACUUM FULL has done its thing. Whether that should be automatic is another question but perhaps the advice should be documented somewhere? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [ADMIN] Bloated pg_shdepend_depender_index
"Gregory Maxwell" <[EMAIL PROTECTED]> writes: > So it's by design that these now bloated index won't shrink if let > unvacuumed? I didn't expect to hit something like that. Well, the VACUUM FULL algorithm is incapable of shrinking indexes --- the only way is REINDEX, or something else that reconstructs indexes from scratch, such as CLUSTER. One of the things we need to look into is putting more smarts into VACUUM so that it automatically does something reasonable when faced with extreme cases like these. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [ADMIN] Bloated pg_shdepend_depender_index
On 3/23/06, Tom Lane <[EMAIL PROTECTED]> wrote: > > amarokcollection=# select relname, pg_relation_size(oid) FROM > > pg_class ORDER BY 2 DESC LIMIT 20; > > relname | pg_relation_size > > -+-- > > pg_attribute_relid_attnam_index | 2881069056 > > pg_class_relname_nsp_index | 1185890304 > > That's in a different database --- evidently, the one where a *whole* > lotta temp tables have been created and dropped. If you haven't been > doing any vacuuming then system catalog bloat is to be expected. Oh sure enough it is.. I'm glad I copied the prompt. :) So it's by design that these now bloated index won't shrink if let unvacuumed? I didn't expect to hit something like that. Okay, so long as it wasn't a bug. I've reindexed the system tables now. The database is now 6mb on disk.. a big improvement from 6gb. :) Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [ADMIN] Bloated pg_shdepend_depender_index
"Gregory Maxwell" <[EMAIL PROTECTED]> writes: > When I vacuum fulled nothing else was connected.. I just restarted PG > and vacuumed again.. no obvious change of disk size (still.. 6.4 > gigs).. but this changed: > amarokcollection=# select relname, pg_relation_size(oid) FROM > pg_class ORDER BY 2 DESC LIMIT 20; > relname | pg_relation_size > -+-- > pg_attribute_relid_attnam_index | 2881069056 > pg_class_relname_nsp_index | 1185890304 That's in a different database --- evidently, the one where a *whole* lotta temp tables have been created and dropped. If you haven't been doing any vacuuming then system catalog bloat is to be expected. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [ADMIN] Bloated pg_shdepend_depender_index
On 3/23/06, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Gregory Maxwell wrote: > > > I recently noticed that this database has grown to a huge size. ... > > Which I found to be somewhat odd because none of the tables have more > > than around 1000 rows. I hadn't been vacuuming because I didn't > > think that anything would ever be deleted so I performed a vacuum > > full... but no luck, it was still about 6.4GB. > > > > With some help of the folks on IRC I discovered... > > postgres=# select relname, pg_relation_size(oid) FROM pg_class ORDER > > BY 2 DESC LIMIT 2; > >relname | pg_relation_size > > -+-- > > pg_shdepend_depender_index |159465472 > > pg_shdepend_reference_index | 97271808 > > (2 rows) > > Huh, that's very strange. > > The pg_shdepend table is a shared catalog, unlike most other catalogs. > Still I don't see why would the indexes not shrink. Do pg_authid, > pg_database, or pg_tablespace show a similar behavior? Is amarok > creating a lot of temp tables or something? When you did that vacuum > full, was there a process connected to the same or another database that > could be holding onto the pg_shdepend table? > > I'll have a look at the code, but I thought I might throw that out, just > in case it rings a bell. Those were the only two objects impacted. I'm not sure what amarok does.. a quick glance at the source shows that it does use temp tables, but I'm not sure how frequently. When I vacuum fulled nothing else was connected.. I just restarted PG and vacuumed again.. no obvious change of disk size (still.. 6.4 gigs).. but this changed: amarokcollection=# select relname, pg_relation_size(oid) FROM pg_class ORDER BY 2 DESC LIMIT 20; relname | pg_relation_size -+-- pg_attribute_relid_attnam_index | 2881069056 pg_class_relname_nsp_index | 1185890304 pg_attribute_relid_attnum_index |710565888 pg_depend_depender_index|451190784 pg_depend_reference_index |422363136 pg_type_typname_nsp_index |245293056 pg_class_oid_index |208420864 pg_shdepend_depender_index |159465472 pg_shdepend_reference_index | 97271808 pg_constraint_conname_nsp_index | 55107584 pg_index_indrelid_index | 50774016 pg_index_indexrelid_index | 50774016 pg_type_oid_index | 44400640 pg_attrdef_adrelid_adnum_index | 15761408 pg_constraint_conrelid_index| 15728640 pg_constraint_oid_index | 15728640 pg_attrdef_oid_index| 12541952 pg_constraint_contypid_index| 7618560 statistics_url_key | 2113536 url_stats | 2105344 The old output was: postgres=# select relname, pg_relation_size(oid) FROM pg_class ORDER BY 2 DESC LIMIT 10; relname | pg_relation_size -+-- pg_shdepend_depender_index |159465472 pg_shdepend_reference_index | 97271808 pg_proc | 352256 pg_proc_proname_args_nsp_index | 344064 pg_depend | 237568 pg_attribute_relid_attnam_index | 204800 pg_attribute| 204800 pg_depend_reference_index | 172032 pg_depend_depender_index| 155648 pg_statistic| 131072 (10 rows) I'm at your disposal on this.. it's weird. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [ADMIN] Bloated pg_shdepend_depender_index
Gregory Maxwell wrote: > I recently noticed that this database has grown to a huge size. ... > Which I found to be somewhat odd because none of the tables have more > than around 1000 rows. I hadn't been vacuuming because I didn't > think that anything would ever be deleted so I performed a vacuum > full... but no luck, it was still about 6.4GB. > > With some help of the folks on IRC I discovered... > postgres=# select relname, pg_relation_size(oid) FROM pg_class ORDER > BY 2 DESC LIMIT 2; >relname | pg_relation_size > -+-- > pg_shdepend_depender_index |159465472 > pg_shdepend_reference_index | 97271808 > (2 rows) Huh, that's very strange. The pg_shdepend table is a shared catalog, unlike most other catalogs. Still I don't see why would the indexes not shrink. Do pg_authid, pg_database, or pg_tablespace show a similar behavior? Is amarok creating a lot of temp tables or something? When you did that vacuum full, was there a process connected to the same or another database that could be holding onto the pg_shdepend table? I'll have a look at the code, but I thought I might throw that out, just in case it rings a bell. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend