Re: [HACKERS] remove flatfiles.c
That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. This was almost exactly the scenario I faced recently. A production database unexpectedly filled up its partition. On investigation, we found a developer had added a component to the application that updated every row in one table each day, exhausting the free space map. Over time, most of the tables in the system had grown to contain 50-70% dead tuples. The owner of the system was understandably reluctant to dump and restore the system, and there wasn't enough space left on the system to rewrite any of the large tables. In the end, I dropped a table (the one owned by the offending developer... 8-), and this gave me just enough space to VACUUM FULL one table at a time. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote: Greg Stark gsst...@mit.edu writes: On Wed, Sep 2, 2009 at 12:01 AM, Alvaro Herreraalvhe...@commandprompt.com wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. I have a client who uses temp tables heavily, hundreds of thousands of creates and drops per day. They also have long running queries. The only thing that keeps catalog bloat somewhat in check is vacuum full on bloated catalogs a few times a day. Without that pg_class, pg_attribute etc quickly balloon to thousands of pages. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
daveg wrote: On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote: Greg Stark gsst...@mit.edu writes: On Wed, Sep 2, 2009 at 12:01 AM, Alvaro Herreraalvhe...@commandprompt.com wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. I have a client who uses temp tables heavily, hundreds of thousands of creates and drops per day. They also have long running queries. The only thing that keeps catalog bloat somewhat in check is vacuum full on bloated catalogs a few times a day. Without that pg_class, pg_attribute etc quickly balloon to thousands of pages. That's a rate of more than one create and drop per second. How does your client handle the fact that VACUUM FULL will exclusively lock those catalog tables? Without knowing more, it looks like a bit of a design issue. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Thu, Sep 03, 2009 at 07:57:25PM -0400, Andrew Dunstan wrote: daveg wrote: On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote: I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. I have a client who uses temp tables heavily, hundreds of thousands of creates and drops per day. They also have long running queries. The only thing that keeps catalog bloat somewhat in check is vacuum full on bloated catalogs a few times a day. Without that pg_class, pg_attribute etc quickly balloon to thousands of pages. That's a rate of more than one create and drop per second. How does your client handle the fact that VACUUM FULL will exclusively lock those catalog tables? Without knowing more, it looks like a bit of a design issue. I'd say it is several per second. They wait for the catalog locks sometimes. This is not an interactive application so that is somewhat acceptable. It also occasionally causes deadlocks which is less agreeable. There are various reasons for the heavy use of temps, mainly having to do with loading external feeds or reusing intermediate query results in a series of queries. It would be great if there was a way to have temp tables that did not get cataloged, eg local cache only. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 6:30 AM, Jaime Casanovajcasa...@systemguards.com.ec wrote: On Tue, Sep 1, 2009 at 9:55 PM, Robert Haasrobertmh...@gmail.com wrote: I'm a bit skeptical about partitioning as a solution, too. The planner is just not clever enough with partitioned tables, yet. Yeah, we need to fix that :) I think we're already reaching the point where the pains of dealing with partitioned tables are usually less than the pains of dealing with VACUUM FULL. analyze and vacuum a *very* big table and even scan a huge index is not a joke neither... Hm, not sure I see this. The sample size for Analyze is not dependent on the size of the table. Only on the stats_target. And vacuum with the VM is now going to be dependent only on the number of updates to the table, not on the size of the table. The problem use cases we have today are only when you really do have enough dead space to clean up that you want to compact the file -- but not so much that it's worth rewriting the whole table using CLUSTER or ALTER TABLE. Perhaps we should go one version with a enable_legacy_full_vacuum which defaults to off. That would at least let us hear about use cases where people are unhappy with a replacement. I did start a while ago on a replacement which used the existing rewrite mechanism to do the equivalent of cluster without changing the ordering. I forget where I left that but I could go back and look at it. I'll be busy for the next few weeks though so it won't be right away. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
All, I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. Basically, for: a) people who don't understand CLUSTER (easily fixed, simply create a VACUUM FULL command which just does CLUSTER on the primary key) b) people who are completely out of space on disk and are trying to shrink the database to free up space. For (b), I think it's OK to just tell those people that they need to move the database files or find something else to delete. Most of the time, they have to do that *anyway* in order for VACUUM FULL to work, since the transaction log is on the same disk. We just need a little more documentation, is all. The problem use cases we have today are only when you really do have enough dead space to clean up that you want to compact the file -- but not so much that it's worth rewriting the whole table using CLUSTER or ALTER TABLE. I haven't seen this use-case in the field. I'm not sure that it actually exists. Anyone run across a case where this made sense? Recently I actually had a client dump and reload their database rather than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took more than 18. Perhaps we should go one version with a enable_legacy_full_vacuum which defaults to off. That would at least let us hear about use cases where people are unhappy with a replacement. I think we do need to do this, just because people won't have changed their admin scripts. But the goal should be to dump VACUUM FULL entirely by 8.6 if we *don't* get serious use-cases. I did start a while ago on a replacement which used the existing rewrite mechanism to do the equivalent of cluster without changing the ordering. I forget where I left that but I could go back and look at it. I'll be busy for the next few weeks though so it won't be right away. This would be very helpful. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, 2009-09-02 at 10:41 -0700, Josh Berkus wrote: All, I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. Basically, for: a) people who don't understand CLUSTER (easily fixed, simply create a VACUUM FULL command which just does CLUSTER on the primary key) b) people who are completely out of space on disk and are trying to shrink the database to free up space. For (b), I think it's OK to just tell those people that they need to move the database files or find something else to delete. Most of the time, they have to do that *anyway* in order for VACUUM FULL to work, since the transaction log is on the same disk. We just need a little more documentation, is all. Right. The problem use cases we have today are only when you really do have enough dead space to clean up that you want to compact the file -- but not so much that it's worth rewriting the whole table using CLUSTER or ALTER TABLE. I haven't seen this use-case in the field. I'm not sure that it actually exists. Anyone run across a case where this made sense? No. Recently I actually had a client dump and reload their database rather than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took more than 18. Exactly. Perhaps we should go one version with a enable_legacy_full_vacuum which defaults to off. That would at least let us hear about use cases where people are unhappy with a replacement. I think we do need to do this, just because people won't have changed their admin scripts. But the goal should be to dump VACUUM FULL entirely by 8.6 if we *don't* get serious use-cases. Agreed, but I think we shouldn't even put it in the postgresql.conf by default. Just document that it exists. Settings for the sake of settings (even ones that may have a corner case) seem to confuse users. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkusj...@agliodbs.com wrote: All, I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. Basically, for: a) people who don't understand CLUSTER (easily fixed, simply create a VACUUM FULL command which just does CLUSTER on the primary key) I don't think we want to cluster on the primary key. I think we just want to rewrite the table keeping the same physical ordering. The problem use cases we have today are only when you really do have enough dead space to clean up that you want to compact the file -- but not so much that it's worth rewriting the whole table using CLUSTER or ALTER TABLE. I haven't seen this use-case in the field. I'm not sure that it actually exists. Anyone run across a case where this made sense? Well I've certainly seen people whose disks are more than 50% full. They tend to be the same people who want to compact their tables. I can't say whether any of them had a single table with associated indexes that were taking up more than 50% but it's not uncommon to have a single table that dominates your database. Recently I actually had a client dump and reload their database rather than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took more than 18. Perhaps we should go one version with a enable_legacy_full_vacuum which defaults to off. That would at least let us hear about use cases where people are unhappy with a replacement. I think we do need to do this, just because people won't have changed their admin scripts. But the goal should be to dump VACUUM FULL entirely by 8.6 if we *don't* get serious use-cases. We could deal with the admin scripts by making VACUUM FULL do the new behaviour. But I actually don't really like that. I wold prefer to break VACUUM FULL since anyone doing it routinely is probably mistaken. We could name the command something which is more descriptive like VACUUM REWRITE or VACUUM REBUILD or something like that. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark gsst...@mit.edu wrote: I don't think we want to cluster on the primary key. I think we just want to rewrite the table keeping the same physical ordering. Well if that's what you want to do, couldn't you do something like?: Lock the table. Prop all indexes Pass the heap with two pointers, one to the first available empty space and one to the first non-dead row past that, and move inside the existing file. Rebuild the indexes. Release the lock. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg, I don't think we want to cluster on the primary key. I think we just want to rewrite the table keeping the same physical ordering. Agreed. Well I've certainly seen people whose disks are more than 50% full. They tend to be the same people who want to compact their tables. I can't say whether any of them had a single table with associated indexes that were taking up more than 50% but it's not uncommon to have a single table that dominates your database. Those people would also need for the tables involved to be fairly small, or to be able to afford a lot of downtime. VACUUM FULL on a 100GB table with current commodity servers can take upwards of 8 hours. I really think the cases of people who have more available downtime than disk space is is vanishingly small group. However, I'll do a survey. Why not? We could deal with the admin scripts by making VACUUM FULL do the new behaviour. But I actually don't really like that. I wold prefer to break VACUUM FULL since anyone doing it routinely is probably mistaken. We could name the command something which is more descriptive like VACUUM REWRITE or VACUUM REBUILD or something like that. Agreed. I like VACUUM REWRITE, as it makes it fairly clear what's going on. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark gsst...@mit.edu writes: On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkusj...@agliodbs.com wrote: Perhaps we should go one version with a enable_legacy_full_vacuum which defaults to off. That would at least let us hear about use cases where people are unhappy with a replacement. I think we do need to do this, just because people won't have changed their admin scripts. But the goal should be to dump VACUUM FULL entirely by 8.6 if we *don't* get serious use-cases. We could deal with the admin scripts by making VACUUM FULL do the new behaviour. But I actually don't really like that. I wold prefer to break VACUUM FULL since anyone doing it routinely is probably mistaken. We could name the command something which is more descriptive like VACUUM REWRITE or VACUUM REBUILD or something like that. What's wrong with just ignoring the FULL option? It's a reserved word anyway because of FULL OUTER JOINs, so there's no syntactic benefit to be had from eliminating it from the VACUUM syntax. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 1:52 PM, Greg Starkgsst...@mit.edu wrote: We could deal with the admin scripts by making VACUUM FULL do the new behaviour. But I actually don't really like that. I wold prefer to break VACUUM FULL since anyone doing it routinely is probably mistaken. So I have a script that goes and finds bloated tables and runs VACUUM FULL on them in the middle of the night if the bloat passes a certain threshold. The tables are small enough and the number of users is low enough that this doesn't cause any problems for me. I'm OK if the name of the command changes, but I'd like there to be a command that I can pass a table name to and get my table debloated without having to make any follow-on decisions (such as picking an index to cluster by). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Robert Haas robertmh...@gmail.com writes: So I have a script that goes and finds bloated tables and runs VACUUM FULL on them in the middle of the night if the bloat passes a certain threshold. The tables are small enough and the number of users is low enough that this doesn't cause any problems for me. I'm OK if the name of the command changes, but I'd like there to be a command that I can pass a table name to and get my table debloated without having to make any follow-on decisions (such as picking an index to cluster by). I think we *should* have a command that works like CLUSTER except it just seqscans the source table without depending on any particular index. Whether it's called VACUUM FULL or something else is a detail. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 6:57 PM, Kevin Grittnerkevin.gritt...@wicourts.gov wrote: Greg Stark gsst...@mit.edu wrote: I don't think we want to cluster on the primary key. I think we just want to rewrite the table keeping the same physical ordering. Well if that's what you want to do, couldn't you do something like?: Lock the table. Prop all indexes Pass the heap with two pointers, one to the first available empty space and one to the first non-dead row past that, and move inside the existing file. Rebuild the indexes. Release the lock. Well dropping the indexes and moving tuples are both hard if you care about crash-safety and transactional integrity. The way we rewrite tables now is: Lock table Create new filenode. Scan old table and copy each record into the new filenode keeping update chains intact. Rebuild all indexes for the table (using a similar strategy with new relfilenodes) Commit the transaction If the transaction aborts at any point you still have the old pg_class record which points to the old relfilenode and all the old indexes are still valid. We have all the pieces we need to do this, it's just a matter of putting them together with a command to call them. A big part of what VACUUM FULL is annoying is the complexity of moving tuples in place. VACUUM FULL has to mark the old tuples and the new copies with its xid. It can't truncate the relation until it commits that xid. Actually I wonder how much performance improvement would come on normal DML just from not having to check xvac in the visibility checks. It's probably not much but... -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 2:31 PM, Tom Lanet...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkusj...@agliodbs.com wrote: Perhaps we should go one version with a enable_legacy_full_vacuum which defaults to off. That would at least let us hear about use cases where people are unhappy with a replacement. I think we do need to do this, just because people won't have changed their admin scripts. But the goal should be to dump VACUUM FULL entirely by 8.6 if we *don't* get serious use-cases. We could deal with the admin scripts by making VACUUM FULL do the new behaviour. But I actually don't really like that. I wold prefer to break VACUUM FULL since anyone doing it routinely is probably mistaken. We could name the command something which is more descriptive like VACUUM REWRITE or VACUUM REBUILD or something like that. What's wrong with just ignoring the FULL option? It's a reserved word anyway because of FULL OUTER JOINs, so there's no syntactic benefit to be had from eliminating it from the VACUUM syntax. Silent behavior changes are usually a bad idea. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 2:54 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: So I have a script that goes and finds bloated tables and runs VACUUM FULL on them in the middle of the night if the bloat passes a certain threshold. The tables are small enough and the number of users is low enough that this doesn't cause any problems for me. I'm OK if the name of the command changes, but I'd like there to be a command that I can pass a table name to and get my table debloated without having to make any follow-on decisions (such as picking an index to cluster by). I think we *should* have a command that works like CLUSTER except it just seqscans the source table without depending on any particular index. Whether it's called VACUUM FULL or something else is a detail. Yeah. We could do this by extending the syntax for cluster (e.g. CLUSTER [VERBOSE] tablename [USING indexname | WITH NO INDEX]), but I'm not sure whether that has any real advantage over just using the existing command name. I confess to being a little fuzzy on the details of how this implementation (seq-scanning the source table for live tuples) is different/better from the current VACUUM FULL implementation. Can someone fill me in? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 8:10 PM, Robert Haasrobertmh...@gmail.com wrote: I confess to being a little fuzzy on the details of how this implementation (seq-scanning the source table for live tuples) is different/better from the current VACUUM FULL implementation. Can someone fill me in? VACUUM FULL is a *lot* more complex. It scans pages *backwards* from the end (which does wonderful things on rotating media). Marks each live tuple it finds as moved off, finds a new place for it (using the free space map I think?). Insert the tuple on the new page and marks it moved in and updates the indexes. Then it commits the transaction but keeps the lock. Then it has to vacuum all the indexes of the references to the old tuples at the end of the table. I think it has to commit that too before it can finally truncate the table. The backwards scan is awful for rotating media. The reading from the end and writing to the beginning is bad too, though hopefully the cache can help that. A lot of the complexity comes in from other parts of the system that have to be aware of tuples that have been moved off or moved in. They have to be able to check whether the vacuum committed or not. That reminds me there was another proposal to do an online vacuum full similar to our concurrent index builds. Do noop-updates to tuples at the end of the table, hopefully finding space for them earlier in the table. Wait until those transactions are no longer visible to anyone else and then truncate. (Actually I think you could just not do anything and let regular lazy vacuum do the truncate). That might be a good practical alternative for sites where copying their entire table isn't practical. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark gsst...@mit.edu writes: The backwards scan is awful for rotating media. The reading from the end and writing to the beginning is bad too, though hopefully the cache can help that. Yeah. And all that pales in comparison to what happens in the indexes. You have to insert index entries (retail) for each moved-in tuple, then after doing the intermediate commit you run around and remove the index entries for the moved-off tuples. Lots of nonsequential access to insert the entries. The cleanup isn't so bad --- it's comparable to what regular lazy VACUUM has to do --- but that's just one step in a very expensive process. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark gsst...@mit.edu writes: It scans pages *backwards* from the end (which does wonderful things on rotating media). Marks each live tuple it finds as moved off, finds a new place for it (using the free space map I think?). BTW, VACUUM FULL doesn't use the free space map --- that code predates the FSM by a lot. It builds its own map of free space during its initial lazy-VACUUM-equivalent scan that just removes dead tuples. While I don't think this hurts performance any, I have seen reports of VACUUM FULL failing outright on large tables because it runs out of memory for this map. So that's still another way in which it's not actually all that useful for huge tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Tom Lane escribió: Greg Stark gsst...@mit.edu writes: It scans pages *backwards* from the end (which does wonderful things on rotating media). Marks each live tuple it finds as moved off, finds a new place for it (using the free space map I think?). BTW, VACUUM FULL doesn't use the free space map --- that code predates the FSM by a lot. It builds its own map of free space during its initial lazy-VACUUM-equivalent scan that just removes dead tuples. Another weird consequence of this is that it bails out if it finds a tuple larger than it can fit in one of the earlier pages; if there's dead space to be compacted before that, it's not compacted. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 3:30 PM, Greg Starkgsst...@mit.edu wrote: On Wed, Sep 2, 2009 at 8:10 PM, Robert Haasrobertmh...@gmail.com wrote: I confess to being a little fuzzy on the details of how this implementation (seq-scanning the source table for live tuples) is different/better from the current VACUUM FULL implementation. Can someone fill me in? VACUUM FULL is a *lot* more complex. It scans pages *backwards* from the end (which does wonderful things on rotating media). Marks each live tuple it finds as moved off, finds a new place for it (using the free space map I think?). Insert the tuple on the new page and marks it moved in and updates the indexes. Then it commits the transaction but keeps the lock. Then it has to vacuum all the indexes of the references to the old tuples at the end of the table. I think it has to commit that too before it can finally truncate the table. The backwards scan is awful for rotating media. The reading from the end and writing to the beginning is bad too, though hopefully the cache can help that. A lot of the complexity comes in from other parts of the system that have to be aware of tuples that have been moved off or moved in. They have to be able to check whether the vacuum committed or not. Ugh. That reminds me there was another proposal to do an online vacuum full similar to our concurrent index builds. Do noop-updates to tuples at the end of the table, hopefully finding space for them earlier in the table. Wait until those transactions are no longer visible to anyone else and then truncate. (Actually I think you could just not do anything and let regular lazy vacuum do the truncate). That might be a good practical alternative for sites where copying their entire table isn't practical. I don't have a strong opinion about whether it's better to take an exclusive lock on everything and reorganize freely, or whether it's better to try to use MVCC semantics to reduce the locking impact (at a cost of difficulty in the presence of long-running transactions). But I think it would be really nice to have an incremental way to reduce table bloat. Locking a table (or even better, part of a table) for a couple of seconds once an hour for several days or weeks figures to be practical in some (many?) environments where locking a table for minutes or hours is not. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Alvaro Herrera alvhe...@commandprompt.com writes: Another weird consequence of this is that it bails out if it finds a tuple larger than it can fit in one of the earlier pages; if there's dead space to be compacted before that, it's not compacted. I don't find a lot wrong with that. The code defines its purpose as being to shorten the table file length. Once it hits a page that can't be emptied, it cannot shorten the file any further, so why shouldn't it stop? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Tom Lane escribió: Alvaro Herrera alvhe...@commandprompt.com writes: Another weird consequence of this is that it bails out if it finds a tuple larger than it can fit in one of the earlier pages; if there's dead space to be compacted before that, it's not compacted. I don't find a lot wrong with that. The code defines its purpose as being to shorten the table file length. Once it hits a page that can't be emptied, it cannot shorten the file any further, so why shouldn't it stop? All that work, and it wasn't capable of defragging the other pages? At the very least it could register them in the FSM. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane escribió: I don't find a lot wrong with that. The code defines its purpose as being to shorten the table file length. Once it hits a page that can't be emptied, it cannot shorten the file any further, so why shouldn't it stop? All that work, and it wasn't capable of defragging the other pages? At the very least it could register them in the FSM. You mean like vac_update_fsm() ? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Tom Lane escribió: Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane escribi�: I don't find a lot wrong with that. The code defines its purpose as being to shorten the table file length. Once it hits a page that can't be emptied, it cannot shorten the file any further, so why shouldn't it stop? All that work, and it wasn't capable of defragging the other pages? At the very least it could register them in the FSM. You mean like vac_update_fsm() ? Huh :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Robert Haas wrote: On Tue, Sep 1, 2009 at 9:29 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Ron Mayer wrote: Greg Stark wrote: That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. Could one hypothetically do update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100); vacuum; and repeat until max(ctid) is small enough? I remember Hannu Krosing said they used something like that to shrink really bloated tables. Maybe we should try to explicitely support a mechanism that worked in that fashion. I think I tried it at some point and found that the problem with it was that ctid was too limited in what it was able to do. I think a way to incrementally shrink large tables would be enormously beneficial. Maybe vacuum could try to do a bit of that each time it runs. Yet when I try it now, I'm having trouble making it work. Would you expect the ctid to be going down in the psql session shown below? I wonder why it isn't. regression=# create table shrink_test as select * from tenk1; SELECT regression=# delete from shrink_test where (unique2 % 2) = 0; DELETE 5000 regression=# create index shrink_test(unique1) on shrink_test(unique1); CREATE INDEX regression=# select max(ctid) from shrink_test; max -- (333,10) (1 row) regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100); UPDATE 100 regression=# vacuum shrink_test; VACUUM regression=# select max(ctid) from shrink_test; max -- (333,21) (1 row) regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100); UPDATE 100 regression=# vacuum shrink_test; VACUUM regression=# select max(ctid) from shrink_test; max -- (333,27) (1 row) regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100); UPDATE 100 regression=# vacuum shrink_test; VACUUM regression=# select max(ctid) from shrink_test; max -- (333,33) (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 11:55 PM, Ron Mayerrm...@cheapcomplexdevices.com wrote: Yet when I try it now, I'm having trouble making it work. Would you expect the ctid to be going down in the psql session shown below? I wonder why it isn't. Even before HOT we preferentially tried to put updated tuples on the same page they were on before. On pre-8.3 if you did these updates *without* the vacuum they would eventually be forced to find a new page and hopefully would find one earlier in the table. On 8.4 HOT will (hopefully) prevent even that from working. Unless you have a long-running transaction in the background it will clean up the old tuples in the chain on the page each time the page fills up. You've deleted half the tuples on the page so the updates will always fit in that space. Eventually you'll hit the maximum number of tuples allowed on the page dead or alive. But the vacuums are defeating that too. A special purpose command could work around all of this. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 8:45 PM, Tom Lanet...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: The backwards scan is awful for rotating media. The reading from the end and writing to the beginning is bad too, though hopefully the cache can help that. Yeah. And all that pales in comparison to what happens in the indexes. You have to insert index entries (retail) for each moved-in tuple, Hm, that could be addressed by buffering index inserts in backend local memory. That's something Heikki proposed a long time ago primarily for improving bulk data loading. Basically it would be a retail version of the bulk loader that we saw at the 10th anniversary where you merge a sorted list into the index. You would still have to flush the buffer at transaction commit but even if it only buffered a few dozen tuples if they're in the same region of the index it would be a win. In this case it could probably buffer hundreds and merge them all into the index en masse. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Mon, 2009-08-31 at 18:53 -0400, Alvaro Herrera wrote: Regarding sync commits that previously happen and now won't, I think the only case worth worrying about is the one in vacuum.c. Do we need a ForceSyncCommit() in there? I'm not sure if vacuum itself already forces sync commit. VACUUM FULL requires ForceSyncCommit(). Not sure why removing them elsewhere is important? Getting robustness wrong is a big, bad thing and this opens us to future error. We already tuned VACUUM so it does very little if it has no work to do, why would one extra I/O improve things so much? If it ain't broke... VACUUM does so many things that I'd rather have it all safely on disk. I'd feel happier with the rule VACUUM always sync commits, so we all remember it and can rely upon it to be the same from release to release. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Simon Riggs si...@2ndquadrant.com writes: VACUUM does so many things that I'd rather have it all safely on disk. I'd feel happier with the rule VACUUM always sync commits, so we all remember it and can rely upon it to be the same from release to release. Non-FULL vacuum has *never* done a sync commit, except in the unusual corner case that it moves the database's datfrozenxid, which is a corner case that didn't even exist until fairly recently. I think the argument that we should have it force sync for no reason whatsoever is silly. We get beat up on a regular basis about spikes in response time; why would you want to have vacuum creating one when it doesn't need to? As for the FULL case, the sync commit is to try to protect a horribly unsafe kluge that should go away entirely (if vacuum full itself doesn't go away entirely). That's hardly something I want to institutionalize either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote: We get beat up on a regular basis about spikes in response time; why would you want to have vacuum creating one when it doesn't need to? If one I/O on a background utility can cause such a spike, we are in serious shitake. I would be more comfortable if the various important things VACUUM does were protected by sync commit. I see no reason to optimise away one I/O just because we might theoretically do so. Any mistake in the theory and we are exposed. Why take the risk? We do many things to check and secure our data, why not this one? If this was suggested separately it as an optimisation you'd laugh and say why bother? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote: We get beat up on a regular basis about spikes in response time; why would you want to have vacuum creating one when it doesn't need to? If one I/O on a background utility can cause such a spike, we are in serious shitake. I would be more comfortable if the various important things VACUUM does were protected by sync commit. I see no reason to optimise away one I/O just because we might theoretically do so. Any mistake in the theory and we are exposed. Why take the risk? *WHAT* risk? Most vacuums do not do a sync commit, and never have. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 1, 2009 at 2:58 PM, Tom Lanet...@sss.pgh.pa.us wrote: We get beat up on a regular basis about spikes in response time; why would you want to have vacuum creating one when it doesn't need to? Isn't this sync commit just going to do the same thing that the wal writer is going to do in at most 200ms anyways? As for the FULL case, the sync commit is to try to protect a horribly unsafe kluge that should go away entirely (if vacuum full itself doesn't go away entirely). I'm all for throwing away VACUUM FULL btw. I was thinking of proposing that we replace it with something like CLUSTER which just rewrites the tuples in the order it finds them. The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Also perhaps tables where there are too many large indexes to make rebuilding them all in one maintenance window practical. I don't see any way to address these problems without something as complex as xvac and moved_in/moved_off and without the index bloat problems. I think we could improve the i/o access patterns we have currently which make vacuum full so slow, but the fundamental problems would remain. So the question is whether those use cases are worth keeping our existing vacuum full for or whether we could do without it and just recommend partitioning for people with tables large enough to make table rewrites impractical. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. Also perhaps tables where there are too many large indexes to make rebuilding them all in one maintenance window practical. If that's the concern maybe we oughta do something about concurrently re-creating those indexes somehow. Plain REINDEX doesn't work of course, but maybe we can do some trick with creating a new index and dropping the original one afterwards. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Wed, Sep 2, 2009 at 12:01 AM, Alvaro Herreraalvhe...@commandprompt.com wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. Also perhaps tables where there are too many large indexes to make rebuilding them all in one maintenance window practical. If that's the concern maybe we oughta do something about concurrently re-creating those indexes somehow. Plain REINDEX doesn't work of course, but maybe we can do some trick with creating a new index and dropping the original one afterwards. Well that doesn't really work if you want to rewrite the table. CLUSTER has to rebuild all the indexes when it's done. I think the solution for both of these is actually partitioning. The bottom line is that having a single table which contains very large amounts of data is awkward to maintain. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark gsst...@mit.edu writes: On Wed, Sep 2, 2009 at 12:01 AM, Alvaro Herreraalvhe...@commandprompt.com wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Greg Stark wrote: That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. Could one hypothetically do update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100); vacuum; and repeat until max(ctid) is small enough? Sure, it'll take longer than vacuum full; but at first glance it seems lightweight enough to do even on a live, heavily accessed table. IIRC I tried something like this once, and it worked to some extent, but after a few loops didn't shrink the table as much as I had expected. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Ron Mayer wrote: Greg Stark wrote: That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. Could one hypothetically do update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100); vacuum; and repeat until max(ctid) is small enough? I remember Hannu Krosing said they used something like that to shrink really bloated tables. Maybe we should try to explicitely support a mechanism that worked in that fashion. I think I tried it at some point and found that the problem with it was that ctid was too limited in what it was able to do. The neat thing is that now that we have the visibility fork, each vacuum needn't scan the whole table each time. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 1, 2009 at 7:42 PM, Tom Lanet...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: On Wed, Sep 2, 2009 at 12:01 AM, Alvaro Herreraalvhe...@commandprompt.com wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. What if your large table doesn't have an index? Then there's no way to cluster. I'm a bit skeptical about partitioning as a solution, too. The planner is just not clever enough with partitioned tables, yet. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 1, 2009 at 9:29 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Ron Mayer wrote: Greg Stark wrote: That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. Could one hypothetically do update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100); vacuum; and repeat until max(ctid) is small enough? I remember Hannu Krosing said they used something like that to shrink really bloated tables. Maybe we should try to explicitely support a mechanism that worked in that fashion. I think I tried it at some point and found that the problem with it was that ctid was too limited in what it was able to do. I think a way to incrementally shrink large tables would be enormously beneficial. Maybe vacuum could try to do a bit of that each time it runs. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Robert Haas escribió: On Tue, Sep 1, 2009 at 7:42 PM, Tom Lanet...@sss.pgh.pa.us wrote: But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. What if your large table doesn't have an index? Then there's no way to cluster. But there's nothing saying we cannot provide a version of CLUSTER that does not follow any index and just copies the live tuples. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 1, 2009 at 10:58 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Robert Haas escribió: On Tue, Sep 1, 2009 at 7:42 PM, Tom Lanet...@sss.pgh.pa.us wrote: But trying to VACUUM FULL that table is going to be horridly painful too, and you'll still have bloated indexes afterwards. You might as well just live with the 50% waste, especially since if you did a full-table update once you'll probably do it again sometime. I'm having a hard time believing that VACUUM FULL really has any interesting use-case anymore. What if your large table doesn't have an index? Then there's no way to cluster. But there's nothing saying we cannot provide a version of CLUSTER that does not follow any index and just copies the live tuples. Agreed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 1, 2009 at 19:34, Greg Stark gsst...@mit.edu wrote: On Wed, Sep 2, 2009 at 12:01 AM, Alvaro Herreraalvhe...@commandprompt.com wrote: The use cases where VACUUM FULL wins currently are where storing two copies of the table and its indexes concurrently just isn't practical. Yeah, but then do you really need to use VACUUM FULL? If that's really a problem then there ain't that many dead tuples around. That's what I want to believe. But picture if you have, say a 1-terabyte table which is 50% dead tuples and you don't have a spare 1-terabytes to rewrite the whole table. It would be interesting if there was something between VACUUM FULL and CLUSTER which could, say, work on a single 1GB segment at a time in a manner similar to cluster. You would still end up with index bloat like vacuum full, though perhaps not as bad, but shuffling around the tuples should be faster. The idea here is that the files can be truncated individually. Two 500MB files is pretty much the same as a single 1GB file on disk. Of course, I'm hand waving and don't have the technical expertise to figure out if it can be done easily within PostgreSQL.
Re: [HACKERS] remove flatfiles.c
On Tue, Sep 1, 2009 at 9:55 PM, Robert Haasrobertmh...@gmail.com wrote: I'm a bit skeptical about partitioning as a solution, too. The planner is just not clever enough with partitioned tables, yet. analyze and vacuum a *very* big table and even scan a huge index is not a joke neither... and yes the planner is not very clever about partitioning and certainly that is something we need to fix not something we have to live with... no that that will be easy but hey! we have very brilliant people here (you being one of them) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] remove flatfiles.c
This patch removes flatfiles.c for good. It doesn't change the keeping of locks in dbcommands.c and user.c, because at least some of them are still required. Regarding sync commits that previously happen and now won't, I think the only case worth worrying about is the one in vacuum.c. Do we need a ForceSyncCommit() in there? I'm not sure if vacuum itself already forces sync commit. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. Index: src/backend/access/transam/twophase_rmgr.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/twophase_rmgr.c,v retrieving revision 1.8 diff -c -p -r1.8 twophase_rmgr.c *** src/backend/access/transam/twophase_rmgr.c 1 Jan 2009 17:23:36 - 1.8 --- src/backend/access/transam/twophase_rmgr.c 31 Aug 2009 21:52:17 - *** *** 18,24 #include commands/async.h #include pgstat.h #include storage/lock.h - #include utils/flatfiles.h #include utils/inval.h --- 18,23 *** const TwoPhaseCallback twophase_recover_ *** 27,33 NULL, /* END ID */ lock_twophase_recover, /* Lock */ NULL, /* Inval */ - NULL, /* flat file update */ NULL, /* notify/listen */ NULL /* pgstat */ }; --- 26,31 *** const TwoPhaseCallback twophase_postcomm *** 37,43 NULL, /* END ID */ lock_twophase_postcommit, /* Lock */ inval_twophase_postcommit, /* Inval */ - flatfile_twophase_postcommit, /* flat file update */ notify_twophase_postcommit, /* notify/listen */ pgstat_twophase_postcommit /* pgstat */ }; --- 35,40 *** const TwoPhaseCallback twophase_postabor *** 47,53 NULL, /* END ID */ lock_twophase_postabort, /* Lock */ NULL, /* Inval */ - NULL, /* flat file update */ NULL, /* notify/listen */ pgstat_twophase_postabort /* pgstat */ }; --- 44,49 Index: src/backend/access/transam/xact.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.274 diff -c -p -r1.274 xact.c *** src/backend/access/transam/xact.c 11 Jun 2009 14:48:54 - 1.274 --- src/backend/access/transam/xact.c 31 Aug 2009 21:50:41 - *** *** 43,49 #include storage/sinvaladt.h #include storage/smgr.h #include utils/combocid.h - #include utils/flatfiles.h #include utils/guc.h #include utils/inval.h #include utils/memutils.h --- 43,48 *** CommitTransaction(void) *** 1608,1619 /* NOTIFY commit must come before lower-level cleanup */ AtCommit_Notify(); - /* - * Update flat files if we changed pg_database, pg_authid or - * pg_auth_members. This should be the last step before commit. - */ - AtEOXact_UpdateFlatFiles(true); - /* Prevent cancel/die interrupt while cleaning up */ HOLD_INTERRUPTS(); --- 1607,1612 *** PrepareTransaction(void) *** 1797,1803 /* close large objects before lower-level cleanup */ AtEOXact_LargeObject(true); ! /* NOTIFY and flatfiles will be handled below */ /* * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in --- 1790,1796 /* close large objects before lower-level cleanup */ AtEOXact_LargeObject(true); ! /* NOTIFY will be handled below */ /* * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in *** PrepareTransaction(void) *** 1860,1866 StartPrepare(gxact); AtPrepare_Notify(); - AtPrepare_UpdateFlatFiles(); AtPrepare_Inval(); AtPrepare_Locks(); AtPrepare_PgStat(); --- 1853,1858 *** PrepareTransaction(void) *** 1909,1915 /* Clean up the snapshot manager */ AtEarlyCommit_Snapshot(); ! /* notify and flatfiles don't need a postprepare call */ PostPrepare_PgStat(); --- 1901,1907 /* Clean up the snapshot manager */ AtEarlyCommit_Snapshot(); ! /* notify doesn't need a postprepare call */ PostPrepare_PgStat(); *** AbortTransaction(void) *** 2036,2042 AtAbort_Portals(); AtEOXact_LargeObject(false); /* 'false' means it's abort */ AtAbort_Notify(); - AtEOXact_UpdateFlatFiles(false); /* * Advertise the fact that we aborted in pg_clog (assuming that we got as --- 2028,2033 *** CommitSubTransaction(void) *** 3764,3771 AtEOSubXact_LargeObject(true, s-subTransactionId, s-parent-subTransactionId); AtSubCommit_Notify(); - AtEOSubXact_UpdateFlatFiles(true, s-subTransactionId, - s-parent-subTransactionId); CallSubXactCallbacks(SUBXACT_EVENT_COMMIT_SUB, s-subTransactionId, s-parent-subTransactionId); --- 3755,3760 *** AbortSubTransaction(void) *** 3885,3892
Re: [HACKERS] remove flatfiles.c
Alvaro Herrera alvhe...@commandprompt.com writes: This patch removes flatfiles.c for good. Aw, you beat me to it. Regarding sync commits that previously happen and now won't, I think the only case worth worrying about is the one in vacuum.c. Do we need a ForceSyncCommit() in there? I'm not sure if vacuum itself already forces sync commit. Hmm, I had been assuming we wouldn't need that anymore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Regarding sync commits that previously happen and now won't, I think the only case worth worrying about is the one in vacuum.c. Do we need a ForceSyncCommit() in there? I'm not sure if vacuum itself already forces sync commit. Hmm, I had been assuming we wouldn't need that anymore. The comment in user.c and dbcommands.c says /* * Force synchronous commit, thus minimizing the window between * creation of the database files and commital of the transaction. If * we crash before committing, we'll have a DB that's taking up disk * space but is not in pg_database, which is not good. */ ForceSyncCommit(); so I think those ones are still necessary. There's another call in RenameDatabase() which I don't think needs a sync commit (because it won't change the dir name), and one in vacuum.c: /* !* If we were able to advance datfrozenxid, mark the flat-file copy of !* pg_database for update at commit, and see if we can truncate pg_clog. !* Also force update if the shared XID-wrap-limit info is stale. */ if (dirty || !TransactionIdLimitIsValid()) - { - database_file_update_needed(); vac_truncate_clog(newFrozenXid); - } } AFAICT this doesn't need a sync commit. (Right now, VACUUM FULL forces one, but lazy vacuum doesn't). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove flatfiles.c
Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane wrote: Hmm, I had been assuming we wouldn't need that anymore. The comment in user.c and dbcommands.c says [...] so I think those ones are still necessary. Yeah, after a look through the code I think you can trust the associated comments: if it says it needs sync commit, put in ForceSyncCommit, else we don't need it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers