Re: [HACKERS] [WIP] In-place upgrade
Robert Haas napsal(a): 1. htup and bufpage API clean up 2. HeapTuple version extension + code cleanup 3. In-place online upgrade 4. Extending pg_class info + more flexible TOAST chunk size big thanks for your review. I think #1 is still partially valid, because it contains general cleanups, but part of it is not necessary now. #2, #3 and #4 you can move to return with feedback section. OK, when can you submit a new version of #1 with the parts that are still valid, updated to CVS HEAD, etc? It does not have priority now. I'm working on space reservation first. Thanks Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Alvaro Herrera napsal(a): Robert Haas escribió: With respect to #4, I know that Alvaro submitted a draft patch, but I'm not clear on whether that needs to be reviewed, because: - I'm not sure whether it's close enough to being finished for a review to be a good use of time. - I'm not sure how much you and Heikki have already reviewed it. - I'm not sure whether this patch buys us anything by itself. I finished that patch, but I didn't submit it because in later discussion it turned out (at least as I read it) that it's considered to be unnecessary. From pg_upgrade perspective, it is something what we will need do anyway. Because TOAST_MAX_CHUNK_SIZE will be different in 8.5 (if you commit CRC). Then we will need the patch for 8.5. It is not necessary for 8.3->8.4 upgrade because TOAST_MAX_CHUNK_SIZE is same. And make this change into toast table now will add unnecessary complexity. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Robert Haas escribió: > With respect to #4, I know that Alvaro submitted a draft patch, but > I'm not clear on whether that needs to be reviewed, because: > > - I'm not sure whether it's close enough to being finished for a > review to be a good use of time. > - I'm not sure how much you and Heikki have already reviewed it. > - I'm not sure whether this patch buys us anything by itself. I finished that patch, but I didn't submit it because in later discussion it turned out (at least as I read it) that it's considered to be unnecessary. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
>> 1. htup and bufpage API clean up >> 2. HeapTuple version extension + code cleanup >> 3. In-place online upgrade >> 4. Extending pg_class info + more flexible TOAST chunk size > big thanks for your review. I think #1 is still partially valid, because it > contains general cleanups, but part of it is not necessary now. #2, #3 and > #4 you can move to return with feedback section. OK, when can you submit a new version of #1 with the parts that are still valid, updated to CVS HEAD, etc? Thanks, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Robert, big thanks for your review. I think #1 is still partially valid, because it contains general cleanups, but part of it is not necessary now. #2, #3 and #4 you can move to return with feedback section. Thanks Zdenek Robert Haas napsal(a): Zdenek - I am a bit murky on where we stand with upgrade-in-place in terms of reviewing. Initially, you had submitted four patches for this commitfest: 1. htup and bufpage API clean up 2. HeapTuple version extension + code cleanup 3. In-place online upgrade 4. Extending pg_class info + more flexible TOAST chunk size I think that it was decided that replacing the heap tuple access macros with function calls was not acceptable, so I have moved patches #1 and #2 to the "Returned with feedback" section. I thought that perhaps the third patch could be salvaged, but the consensus seemed to be to go in a new direction, so I'm thinking that one should probably be moved to "Returned with feedback" as well. However, I'm not clear on whether you will be submitting something else instead and whether that thing should be considered material for this commitfest. Can you let me know how you are thinking about this? With respect to #4, I know that Alvaro submitted a draft patch, but I'm not clear on whether that needs to be reviewed, because: - I'm not sure whether it's close enough to being finished for a review to be a good use of time. - I'm not sure how much you and Heikki have already reviewed it. - I'm not sure whether this patch buys us anything by itself. Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Zdenek - I am a bit murky on where we stand with upgrade-in-place in terms of reviewing. Initially, you had submitted four patches for this commitfest: 1. htup and bufpage API clean up 2. HeapTuple version extension + code cleanup 3. In-place online upgrade 4. Extending pg_class info + more flexible TOAST chunk size I think that it was decided that replacing the heap tuple access macros with function calls was not acceptable, so I have moved patches #1 and #2 to the "Returned with feedback" section. I thought that perhaps the third patch could be salvaged, but the consensus seemed to be to go in a new direction, so I'm thinking that one should probably be moved to "Returned with feedback" as well. However, I'm not clear on whether you will be submitting something else instead and whether that thing should be considered material for this commitfest. Can you let me know how you are thinking about this? With respect to #4, I know that Alvaro submitted a draft patch, but I'm not clear on whether that needs to be reviewed, because: - I'm not sure whether it's close enough to being finished for a review to be a good use of time. - I'm not sure how much you and Heikki have already reviewed it. - I'm not sure whether this patch buys us anything by itself. Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Nov 9, 2008, at 11:09 PM, Joshua D. Drake wrote: I think it's time for people to stop asking for the moon and realize that if we don't constrain this feature pretty darn tightly, we will have *nothing at all* for 8.4. Again. Gotta go with Tom on this one. The idea that we would somehow upgrade from 8.1 to 8.4 is silly. Yes it will be unfortunate for those running 8.1 but keeping track of multi version like that is going to be entirely too expensive. I agree as well. If we can get the at least the base level stuff in 8.4 so that 8.5 and beyond is in-place upgradable then that is a huge win. If we could support 8.2 or 8.3 or 6.5 :) that would be nice, but I think dealing with everything retroactively will cause our heads to explode and a mountain of awful code to arise. If we say "8.4 and beyond will be upgradable" we can toss everything in we think we'll need to deal with it and not worry about the retroactive case (unless someone has a really clever(tm) idea!) This can't be an original problem to solve, too many other databases do it as well. -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Mon, 2008-11-10 at 09:14 -0500, Matthew T. O'Connor wrote: > Tom Lane wrote: > > Decibel! <[EMAIL PROTECTED]> writes: > > > >> I think that's pretty seriously un-desirable. It's not at all > >> uncommon for databases to stick around for a very long time and then > >> jump ahead many versions. I don't think we want to tell people they > >> can't do that. > >> > > > > Of course they can do that --- they just have to do it one version at a > > time. > > Also, people may be less likely to stick with an old outdated version > for years and years if the upgrade process is easier. Kind of OT but, I don't agree with this. There will always be those who are willing to just upgrade because they can but the smart play is to upgrade because you need to. If anything in place upgrades is just going to remove the last real business and technical barrier to using postgresql for enterprises. Joshua D. Drake > > -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane wrote: Decibel! <[EMAIL PROTECTED]> writes: I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't think we want to tell people they can't do that. Of course they can do that --- they just have to do it one version at a time. Also, people may be less likely to stick with an old outdated version for years and years if the upgrade process is easier. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Decibel! napsal(a): Unless I'm mistaken, there are only two cases we care about for additional space: per-page and per-tuple. Yes. And maybe special space indexes could be extended, but it is covered in per-page setting. Those requirements could also vary for different types of pg_class objects. What we need is an API that allows an administrator to tell the database to start setting this space aside. One possibility: We need API or mechanism how in-place upgrade will setup it. It must be done by in-place upgrade. relkind: Essentially, heap vs toast, though I suppose it's possible we might need this for sequences. Sequences are converted during catalog upgrade. Once we have an API, we need to get users to make use of it. I'm thinking add something like the following to the release notes: "To upgrade from a prior version to 8.4, you will need to run some of the following commands, depending on what version you are currently using: It is too complicated. At first it depends also on architecture and it is possible to easily compute by in-place upgrade script. What you need is only run script which do all setting for you. You can obtain it from next version (IIRC Oracle do it this way) or we can add this configuration script into previous version during a minor update. OTOH, we might not want to go mucking around with changing the catalog for older versions (I'm not even sure if we can). So perhaps it would be better to store this information in a separate table, or maybe a separate file. That might be best anyway; we generally wouldn't need this information, so it would be nice if it wasn't bloating pg_class all the time. It is why I selected relopt for storing this configuration parameter, which is supported from 8.2 and upgrade from 8.1->8.2 works fine. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Sun, 2008-11-09 at 20:02 -0500, Tom Lane wrote: > Decibel! <[EMAIL PROTECTED]> writes: > > I think that's pretty seriously un-desirable. It's not at all > > uncommon for databases to stick around for a very long time and then > > jump ahead many versions. I don't think we want to tell people they > > can't do that. > > Of course they can do that --- they just have to do it one version at a > time. > > I think it's time for people to stop asking for the moon and realize > that if we don't constrain this feature pretty darn tightly, we will > have *nothing at all* for 8.4. Again. Gotta go with Tom on this one. The idea that we would somehow upgrade from 8.1 to 8.4 is silly. Yes it will be unfortunate for those running 8.1 but keeping track of multi version like that is going to be entirely too expensive. At some point it won't matter but right now it really does. Joshua D. Drake > > regards, tom lane > -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Decibel! <[EMAIL PROTECTED]> writes: > I think that's pretty seriously un-desirable. It's not at all > uncommon for databases to stick around for a very long time and then > jump ahead many versions. I don't think we want to tell people they > can't do that. Of course they can do that --- they just have to do it one version at a time. I think it's time for people to stop asking for the moon and realize that if we don't constrain this feature pretty darn tightly, we will have *nothing at all* for 8.4. Again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Nov 6, 2008, at 1:31 PM, Bruce Momjian wrote: 3. What about multi-release upgrades? Say someone wants to upgrade from 8.3 to 8.6. 8.6 only knows how to read pages that are 8.5-and-a-half or better, 8.5 only knows how to read pages that are 8.4-and-a-half or better, and 8.4 only knows how to read pages that are 8.3-and-a-half or better. So the user will have to upgrade to 8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6. Yes. I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't think we want to tell people they can't do that. More importantly, I think we're barking up the wrong tree by putting migration knowledge into old versions. All that the old versions need to do is guarantee a specific amount of free space per page. We should provide a mechanism to tell a cluster what that free space requirement is, and not hard-code it into the backend. Unless I'm mistaken, there are only two cases we care about for additional space: per-page and per-tuple. Those requirements could also vary for different types of pg_class objects. What we need is an API that allows an administrator to tell the database to start setting this space aside. One possibility: pg_min_free_space( version, relkind, bytes_per_page, bytes_per_tuple ); pg_min_free_space_index( version, indexkind, bytes_per_page, bytes_per_tuple ); version: This would be provided as a safety mechanism. You would have to provide the major version that matches what the backend is running. See below for an example. relkind: Essentially, heap vs toast, though I suppose it's possible we might need this for sequences. indexkind: Because we support different types of indexes, I think we need to handle them differently than heap/toast. If we wanted, we could have a single function that demands that indexkind is NULL if relkind != 'index'. bytes_per_(page|tuple): obvious. :) Once we have an API, we need to get users to make use of it. I'm thinking add something like the following to the release notes: "To upgrade from a prior version to 8.4, you will need to run some of the following commands, depending on what version you are currently using: For version 8.3: SELECT pg_min_free_space( '8.3', 'heap', 4, 12 ); SELECT pg_min_free_space( '8.3', 'toast', 4, 12 ); For version 8.2: SELECT pg_min_free_space( '8.2', 'heap', 14, 12 ); SELECT pg_min_free_space( '8.2', 'toast', 14, 12 ); SELECT pg_min_free_space_index( '8.2', 'b-tree', 4, 4);" (Note I'm just pulling numbers out of thin air in this example.) As you can see, we pass in the version number to ensure that if someone accidentally cut and pastes the wrong stuff they know what they did wrong immediately. One downside to this scheme is that it doesn't provide a mechanism to ensure that all required minimum free space requirements were passed in. Perhaps we want a function that takes an array of complex types and forces you to supply information for all known storage mechanisms. Another possibility would be to pass in some kind of binary format that contains a checksum. Even if we do come up with a pretty fool-proof way to tell the old version what free space it needs to set aside, I think we should still have a mechanism for the new version to know exactly what the old version has set aside, and if it's actually been accomplished or not. One option that comes to mind is to add min_free_space_per_page and min_free_space_per_tuple to pg_class. Normally these fields would be NULL; the old version would only set them once it had verified that all pages in a given relation met those requirements (presumably via vacuum). The new version would check all these values on startup to ensure they made sense. OTOH, we might not want to go mucking around with changing the catalog for older versions (I'm not even sure if we can). So perhaps it would be better to store this information in a separate table, or maybe a separate file. That might be best anyway; we generally wouldn't need this information, so it would be nice if it wasn't bloating pg_class all the time. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Zdenek Kotala <[EMAIL PROTECTED]> writes: > Tom Lane napsal(a): >> * Add a "format serial number" column to pg_class, and probably also >> pg_database. Rather like the frozenxid columns, this would have the >> semantics that all pages in a relation or database are known to have at >> least the specified format number. > I prefer to have latest processed block. InvalidBlockNumber would mean > nothing is processed and 0 means everything is already reserved. I > suggest to process it backward. It should prevent to check new > extended block which will be already correctly setup. That seems bizarre and not very helpful. In the first place, if we're driving it off vacuum there would be no opportunity for recording a half-processed state value. In the second place, this formulation fails to provide any evidence of *what* processing you completed or didn't complete. In a multi-step upgrade sequence I think it's going to be a mess if we aren't explicit about that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane napsal(a): I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to occur, but also tweak the normal code paths to maintain that minimum free space. OK. I will focus on this. I guess this approach revival my hook patch: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00990.php The full concept as I understood it (dunno why Bruce left all these details out of his message) went like this: * Add a "format serial number" column to pg_class, and probably also pg_database. Rather like the frozenxid columns, this would have the semantics that all pages in a relation or database are known to have at least the specified format number. * There would actually be two serial numbers per release, at least for releases where pre-update prep work is involved --- for instance, between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is 8.3 but known ready to update to 8.4 (eg, enough free space available). Minor releases of 8.3 that appear with or subsequent to 8.4 release understand the "half" format number and how to upgrade to it. I prefer to have latest processed block. InvalidBlockNumber would mean nothing is processed and 0 means everything is already reserved. I suggest to process it backward. It should prevent to check new extended block which will be already correctly setup. * VACUUM would be empowered, in the same way as it handles frozenxid maintenance, to update any less-than-the-latest-version pages and then fix the pg_class and pg_database entries. * We could mechanically enforce that you not update until the database is ready for it by checking pg_database.datformatversion during postmaster startup. I'm don't understand you here? Do you mean on old server version or new server version. Or who will perform this check? Do not remember that we currently do catalog conversion by dump and import which lost all extended information. Thanks Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane napsal(a): Heikki Linnakangas <[EMAIL PROTECTED]> writes: Adding catalog columns seems rather complicated, and not back-patchable. Agreed, we'd not be able to make them retroactively appear in 8.3. I imagined that you would have just a single cluster-wide variable, a GUC perhaps, indicating how much space should be reserved by updates/inserts. Then you'd have an additional program, perhaps a new contrib module, that sets the variable to the right value for the version you're upgrading, and scans through all tables, moving tuples so that every page has enough free space for the upgrade. After that's done, it'd set a flag in the data directory indicating that the cluster is ready for upgrade. Possibly that could work. The main thing is to have a way of being sure that the prep work has been completed on every page of the database. The disadvantage of not having catalog support is that you'd have to complete the entire scan operation in one go to be sure you'd hit everything. I prefer to have catalog support. Special on very long tables it helps when somebody stop preupgrade script for some reason. Another thought here is that I don't think we are yet committed to any changes that require extra space between 8.3 and 8.4, are we? The proposed addition of CRC words could be put off to 8.5, for instance. So it seems at least within reach to not require any preparatory steps for 8.3-to-8.4, and put the infrastructure in place now to support such steps in future go-rounds. Yeah. We still have V4 without any storage modification (exclude HASH index). However I think if reloptions will be use for storing information about reserved space then It shouldn't be a problem. But we need to be sure if it is possible. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Heikki Linnakangas napsal(a): Tom Lane wrote: I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to occur, but also tweak the normal code paths to maintain that minimum free space. Agreed, the backend needs to be modified to reserve the space. The full concept as I understood it (dunno why Bruce left all these details out of his message) went like this: * Add a "format serial number" column to pg_class, and probably also pg_database. Rather like the frozenxid columns, this would have the semantics that all pages in a relation or database are known to have at least the specified format number. * There would actually be two serial numbers per release, at least for releases where pre-update prep work is involved --- for instance, between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is 8.3 but known ready to update to 8.4 (eg, enough free space available). Minor releases of 8.3 that appear with or subsequent to 8.4 release understand the "half" format number and how to upgrade to it. * VACUUM would be empowered, in the same way as it handles frozenxid maintenance, to update any less-than-the-latest-version pages and then fix the pg_class and pg_database entries. * We could mechanically enforce that you not update until the database is ready for it by checking pg_database.datformatversion during postmaster startup. Adding catalog columns seems rather complicated, and not back-patchable. Not backpatchable means that we'd need to be sure now that the format serial numbers are enough for the upcoming 8.4-8.5 upgrade. Reloptions is suitable for keeping amount of reserver space. And it can be back ported into 8.3 and 8.2. And of course there is no problem to convert 8.1->8.2. For backported branch would be better to combine internal modification - preserve space and e.g. store procedure which check all relations. In the 8.4 and newer pg_class could be extended for new attributes. I imagined that you would have just a single cluster-wide variable, a GUC perhaps, indicating how much space should be reserved by updates/inserts. You sometimes need different reserved size for different type of relation. For example on 32bit x86 you don't need reserve space for heap but you need do it for indexes (between v3->v4). Better is to use reloptions and pre-upgrade procedure sets this information correctly. Then you'd have an additional program, perhaps a new contrib module, that sets the variable to the right value for the version you're upgrading, and scans through all tables, moving tuples so that every page has enough free space for the upgrade. After that's done, it'd set a flag in the data directory indicating that the cluster is ready for upgrade. I prefer to have this information in pg_class. It is accessible by SQL commands. pg_class should also contains information about last checked page to prevent repeatable check on very large tables. The tool could run concurrently with normal activity, so you could just let it run for as long as it takes. Agree. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Thu, 6 Nov 2008, Tom Lane wrote: -Is it worth considering making CRCs an optional compile-time feature, and that (for now at least) you couldn't get them and the in-place upgrade at the same time? Hmm ... might be better than not offering them in 8.4 at all, but the thing is that then you are asking packagers to decide for their customers which is more important. And I'd bet you anything you want that in-place upgrade would be their choice. I was thinking of something similar to how --enable-thread-safety has been rolled out. It could be hanging around there and available to those who want it in their build, even though it might not be available by default in a typical mainstream distribution. Since there's already a GUC for toggling the checksums in the code, internally it could work like debug_assertions where you only get that option if support was compiled in appropriately. Just a thought I wanted to throw out there, if it makes eventual upgrades from 8.4 more complicated it may not be worth even considering. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Greg Smith <[EMAIL PROTECTED]> writes: > On Thu, 6 Nov 2008, Tom Lane wrote: >> Another thought here is that I don't think we are yet committed to any >> changes that require extra space between 8.3 and 8.4, are we? The >> proposed addition of CRC words could be put off to 8.5, for instance. > I was just staring at that code as you wrote this thinking about the same > thing. ... > -Is it worth considering making CRCs an optional compile-time feature, and > that (for now at least) you couldn't get them and the in-place upgrade at > the same time? Hmm ... might be better than not offering them in 8.4 at all, but the thing is that then you are asking packagers to decide for their customers which is more important. And I'd bet you anything you want that in-place upgrade would be their choice. Also, having such an option would create extra complexity for 8.4-to-8.5 upgrades. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> The idea that you're going to get in-place upgrade all the way back to 8.2 > without taking the database down for a even little bit to run such a utility > is hard to pull off, and it's impressive that Zdenek and everyone else > involved has gotten so close to doing it. I think we should at least wait to see what the next version of his patch looks like before making any final decisions. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Thu, 6 Nov 2008, Tom Lane wrote: Another thought here is that I don't think we are yet committed to any changes that require extra space between 8.3 and 8.4, are we? The proposed addition of CRC words could be put off to 8.5, for instance. I was just staring at that code as you wrote this thinking about the same thing. CRCs are a great feature I'd really like to see. On the other hand, announcing that 8.4 features in-place upgrades for 8.3 databases, and that the project has laid the infrastructure such that future releases will also upgrade in-place, would IMHO be the biggest positive announcement of the new release by a large margin. At least then new large (>1TB) installs could kick off on either the stable 8.3 or 8.4 knowing they'd never be forced to deal with dump/reload, whereas right now there is no reasonable solution for them that involves PostgreSQL (I just crossed 3TB on a system last month and I'm not looking forward to its future upgrades). Two questions come to mind here: -If you reduce the page layout upgrade problem to "convert from V4 to V5 adding support for CRCs", is there a worthwhile simpler path to handling that without dragging the full complexity of the older page layout changes in? -Is it worth considering making CRCs an optional compile-time feature, and that (for now at least) you couldn't get them and the in-place upgrade at the same time? Stepping back for a second, the idea that in-place upgrade is only worthwhile if it yields zero downtime isn't necessarily the case. Even having an offline-only upgrade tool to handle the more complicated situations where tuples have to be squeezed onto another page would still be a major improvement over the current situation. The thing that you have to recognize here is that dump/reload is extremely slow because of bottlenecks in the COPY process. That makes for a large amount of downtime--many hours isn't unusual. If older version upgrade downtime was reduced to how long it takes to run a "must scan every page and fiddle with it if full" tool, that would still be a giant improvement over the current state of things. If Zdenek's figures that only a small percentages of pages will need such adjustment holds up, that should take only some factor longer than a sequential scan of the whole database. That's not instant, but it's at least an order of magnitude faster than a dump/reload on a big system. The idea that you're going to get in-place upgrade all the way back to 8.2 without taking the database down for a even little bit to run such a utility is hard to pull off, and it's impressive that Zdenek and everyone else involved has gotten so close to doing it. I personally am on the fence as to whether it's worth paying even the 1% penalty for that implementation all the time just to get in-place upgrades. If an offline utility with reasonable (scan instead of dump/reload) downtime and closer to zero overhead when finished was available instead, that might be a more reasonable trade-off to make for handling older releases. There are so many bottlenecks in the older versions that you're less likely to find a database too large to dump and reload there anyway. It would also be the case that improvements to that offline utility could continue after 8.4 proper was completely frozen. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Adding catalog columns seems rather complicated, and not back-patchable. Agreed, we'd not be able to make them retroactively appear in 8.3. > I imagined that you would have just a single cluster-wide variable, a > GUC perhaps, indicating how much space should be reserved by > updates/inserts. Then you'd have an additional program, perhaps a new > contrib module, that sets the variable to the right value for the > version you're upgrading, and scans through all tables, moving tuples so > that every page has enough free space for the upgrade. After that's > done, it'd set a flag in the data directory indicating that the cluster > is ready for upgrade. Possibly that could work. The main thing is to have a way of being sure that the prep work has been completed on every page of the database. The disadvantage of not having catalog support is that you'd have to complete the entire scan operation in one go to be sure you'd hit everything. Another thought here is that I don't think we are yet committed to any changes that require extra space between 8.3 and 8.4, are we? The proposed addition of CRC words could be put off to 8.5, for instance. So it seems at least within reach to not require any preparatory steps for 8.3-to-8.4, and put the infrastructure in place now to support such steps in future go-rounds. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: > That means, in essence, that the earliest possible version that could > be in-place upgraded would be an 8.4 system - we are giving up > completely on in-place upgrade to 8.4 from any earlier version (which > personally I thought was the whole point of this feature in the first > place). Quite honestly, given where we are in the schedule and the lack of consensus about how to do this, I think we would be well advised to decide right now to forget about supporting in-place upgrade to 8.4, and instead work on allowing in-place upgrades from 8.4 onwards. Shooting for a general-purpose does-it-all scheme that can handle old versions that had no thought of supporting such updates is likely to ensure that we end up with *NOTHING*. What Bruce is proposing, I think, is that we intentionally restrict what we want to accomplish to something that might be within reach now and also sustainable over the long term. Planning to update any version to any other version is *not* sustainable --- we haven't got the resources nor the interest to create large amounts of conversion code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Bruce Momjian <[EMAIL PROTECTED]> writes: > I envision a similar system where we have utilities to guarantee all > pages have enough free space, and all pages are the current version, > before allowing an upgrade-in-place to the next version. Such a > consistent API will make the job for users easier and our job simpler, > and with upgrade-in-place, where we have limited time and resources to > code this for each release, simplicity is important. An external utility doesn't seem like the right way to approach it. For example, given the need to ensure X amount of free space in each page, the only way to guarantee that would be to shut down the database while you run the utility over all the pages --- otherwise somebody might fill some page up again. And that completely defeats the purpose, which is to have minimal downtime during upgrade. I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to occur, but also tweak the normal code paths to maintain that minimum free space. The full concept as I understood it (dunno why Bruce left all these details out of his message) went like this: * Add a "format serial number" column to pg_class, and probably also pg_database. Rather like the frozenxid columns, this would have the semantics that all pages in a relation or database are known to have at least the specified format number. * There would actually be two serial numbers per release, at least for releases where pre-update prep work is involved --- for instance, between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is 8.3 but known ready to update to 8.4 (eg, enough free space available). Minor releases of 8.3 that appear with or subsequent to 8.4 release understand the "half" format number and how to upgrade to it. * VACUUM would be empowered, in the same way as it handles frozenxid maintenance, to update any less-than-the-latest-version pages and then fix the pg_class and pg_database entries. * We could mechanically enforce that you not update until the database is ready for it by checking pg_database.datformatversion during postmaster startup. So the update process would require users to install a suitably late version of 8.3, vacuum everything over a suitable maintenance window, then install 8.4, then perhaps vacuum everything again if they want to try to push page update work into specific maintenance windows. But the DB is up and functioning the whole time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> And almost guarantee that the job will never be completed, or tested > fully. Remember that in-place upgrades would be pretty painless so > doing multiple major upgrades should not be a difficult requiremnt, or > they can dump/reload their data to skip it. Regardless of what design is chosen, there's no requirement that we support in-place upgrade from 8.3 to 8.6, or even 8.4 to 8.6, in one shot. But the design that you and Tom are proposing pretty much ensures that it will be impossible. But that's certainly the least important reason not to do it this way. I think this comment from Heikki is pretty revealing: > Adding catalog columns seems rather complicated, and not back-patchable. Not > backpatchable means that we'd need to be sure now > that the format serial numbers are enough for the upcoming 8.4-8.5 upgrade. That means, in essence, that the earliest possible version that could be in-place upgraded would be an 8.4 system - we are giving up completely on in-place upgrade to 8.4 from any earlier version (which personally I thought was the whole point of this feature in the first place). And we'll only be able to in-place upgrade to 8.5 if the unproven assumption that these catalog changes are sufficient turns out to be true, or if whatever other changes turn out to be necessary are back-patchable. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Robert Haas wrote: > > That's all fine and dandy, except that it presumes that you can perform > > SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that > > A-E aren't there until they get converted. Which is exactly the > > overhead we were looking to avoid. > > I don't understand this comment at all. Unless you have some sort of > magical wand in your back pocket that will instantaneously transform > the entire database, there is going to be a period of time when you > have to cope with both V3 and V4 pages. ISTM that what we should be > talking about here is: > > (1) How are we going to do that in a way that imposes near-zero > overhead once the entire database has been converted? > (2) How are we going to do that in a way that is minimally invasive to the > code? > (3) Can we accomplish (1) and (2) while still retaining somewhat > reasonable performance for V3 pages? > > Zdenek's initial proposal did this by replacing all of the tuple > header macros with functions that were conditionalized on page > version. I think we agree that's not going to work. That doesn't > mean that there is no approach that can work, and we were discussing > possible ways to make it work upthread until the thread got hijacked > to discuss the right way of handling page expansion. Now that it > seems we agree that a transaction can be used to move tuples onto new > pages, I think we'd be well served to stop talking about page > expansion and get back to the original topic: where and how to insert > the hooks for V3 tuple handling. I think the above is a good summary. For me, the problem with any approach that has information about prior-version block formats in the main code path is code complexity, and secondarily performance. I know there is concern that converting all blocks on read-in might expand the page beyond 8k in size. One idea Heikki had was to require some tool must be run on minor releases before a major upgrade to guarantee there is enough free space to convert the block to the current format on read-in, which would localize the information about prior block formats. We could release the tool in minor branches around the time as a major release. Also consider that there are very few releases that expand the page size. For these reasons, the expand-the-page-beyond-8k problem should not be dictating what approach we take for upgrade-in-place because there are workarounds for the problem, and the problem is rare. I would like us to again focus on converting the pages to the current version format on read-in, and perhaps a tool to convert all old pages to the new format. FYI, we are also going to need the ability to convert all pages to the current format for multi-release upgrades. For example, if you did upgrade-in-place from 8.2 to 8.3, you are going to need to update all pages to the 8.3 format before doing upgrade-in-place to 8.4; perhaps vacuum can do something like this on a per-table basis, and we can record that status a pg_class column. Also, consider that when we did PITR, we required commands before and after the tar so that there was a consistent API for PITR, and later had to add capabilities to those functions, but the user API didn't change. I envision a similar system where we have utilities to guarantee all pages have enough free space, and all pages are the current version, before allowing an upgrade-in-place to the next version. Such a consistent API will make the job for users easier and our job simpler, and with upgrade-in-place, where we have limited time and resources to code this for each release, simplicity is important. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane wrote: I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to occur, but also tweak the normal code paths to maintain that minimum free space. Agreed, the backend needs to be modified to reserve the space. The full concept as I understood it (dunno why Bruce left all these details out of his message) went like this: * Add a "format serial number" column to pg_class, and probably also pg_database. Rather like the frozenxid columns, this would have the semantics that all pages in a relation or database are known to have at least the specified format number. * There would actually be two serial numbers per release, at least for releases where pre-update prep work is involved --- for instance, between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is 8.3 but known ready to update to 8.4 (eg, enough free space available). Minor releases of 8.3 that appear with or subsequent to 8.4 release understand the "half" format number and how to upgrade to it. * VACUUM would be empowered, in the same way as it handles frozenxid maintenance, to update any less-than-the-latest-version pages and then fix the pg_class and pg_database entries. * We could mechanically enforce that you not update until the database is ready for it by checking pg_database.datformatversion during postmaster startup. Adding catalog columns seems rather complicated, and not back-patchable. Not backpatchable means that we'd need to be sure now that the format serial numbers are enough for the upcoming 8.4-8.5 upgrade. I imagined that you would have just a single cluster-wide variable, a GUC perhaps, indicating how much space should be reserved by updates/inserts. Then you'd have an additional program, perhaps a new contrib module, that sets the variable to the right value for the version you're upgrading, and scans through all tables, moving tuples so that every page has enough free space for the upgrade. After that's done, it'd set a flag in the data directory indicating that the cluster is ready for upgrade. The tool could run concurrently with normal activity, so you could just let it run for as long as it takes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Robert Haas wrote: > The second point could probably be addressed with a GUC but the first > one certainly can't. > > 3. What about multi-release upgrades? Say someone wants to upgrade > from 8.3 to 8.6. 8.6 only knows how to read pages that are > 8.5-and-a-half or better, 8.5 only knows how to read pages that are > 8.4-and-a-half or better, and 8.4 only knows how to read pages that > are 8.3-and-a-half or better. So the user will have to upgrade to > 8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6. Yes. > It seems to me that if there is any way to put all of the logic to > handle old page versions in the new code that would be much better, > especially if it's an optional feature that can be compiled in or not. > Then when it's time to upgrade from 8.3 to 8.6 you could do: > > ./configure --with-upgrade-83 --with-upgrade-84 --with-upgrade85 > > but if you don't need the code to handle old page versions you can: > > ./configure --without-upgrade85 > > Admittedly, this requires making the new code capable of rearranging > pages to create free space when necessary, and to be able to continue > to execute queries while doing it, but ways of doing this have been > proposed. The only uncertainty is as to whether the performance and > code complexity can be kept manageable, but I don't believe that > question has been explored to the point where we should be ready to > declare defeat. And almost guarantee that the job will never be completed, or tested fully. Remember that in-place upgrades would be pretty painless so doing multiple major upgrades should not be a difficult requiremnt, or they can dump/reload their data to skip it. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> An external utility doesn't seem like the right way to approach it. > For example, given the need to ensure X amount of free space in each > page, the only way to guarantee that would be to shut down the database > while you run the utility over all the pages --- otherwise somebody > might fill some page up again. And that completely defeats the purpose, > which is to have minimal downtime during upgrade. Agreed. > I think we can have a notion of pre-upgrade maintenance, but it would > have to be integrated into normal operations. For instance, if > conversion to 8.4 requires extra free space, we'd make late releases > of 8.3.x not only be able to force that to occur, but also tweak the > normal code paths to maintain that minimum free space. 1. This seems to fly in the face of the sort of thing we've traditionally back-patched. The code to make pages ready for upgrade to the next major release will not necessarily be straightforward (in fact it probably isn't, otherwise we wouldn't have insisted on a two-stage conversion process), which turns a seemingly safe minor upgrade into a potentially dangerous operation. 2. Just because I want to upgrade to 8.3.47 and get the latest bug fixes does not mean that I have any intention of upgrading to 8.4, and yet you've rearranged all of my pages to have useless free space in them (possibly at considerable and unexpected I/O cost for at least as long as the conversion is running). The second point could probably be addressed with a GUC but the first one certainly can't. 3. What about multi-release upgrades? Say someone wants to upgrade from 8.3 to 8.6. 8.6 only knows how to read pages that are 8.5-and-a-half or better, 8.5 only knows how to read pages that are 8.4-and-a-half or better, and 8.4 only knows how to read pages that are 8.3-and-a-half or better. So the user will have to upgrade to 8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6. It seems to me that if there is any way to put all of the logic to handle old page versions in the new code that would be much better, especially if it's an optional feature that can be compiled in or not. Then when it's time to upgrade from 8.3 to 8.6 you could do: ./configure --with-upgrade-83 --with-upgrade-84 --with-upgrade85 but if you don't need the code to handle old page versions you can: ./configure --without-upgrade85 Admittedly, this requires making the new code capable of rearranging pages to create free space when necessary, and to be able to continue to execute queries while doing it, but ways of doing this have been proposed. The only uncertainty is as to whether the performance and code complexity can be kept manageable, but I don't believe that question has been explored to the point where we should be ready to declare defeat. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > I envision a similar system where we have utilities to guarantee all > > pages have enough free space, and all pages are the current version, > > before allowing an upgrade-in-place to the next version. Such a > > consistent API will make the job for users easier and our job simpler, > > and with upgrade-in-place, where we have limited time and resources to > > code this for each release, simplicity is important. > > An external utility doesn't seem like the right way to approach it. > For example, given the need to ensure X amount of free space in each > page, the only way to guarantee that would be to shut down the database > while you run the utility over all the pages --- otherwise somebody > might fill some page up again. And that completely defeats the purpose, > which is to have minimal downtime during upgrade. > > I think we can have a notion of pre-upgrade maintenance, but it would > have to be integrated into normal operations. For instance, if > conversion to 8.4 requires extra free space, we'd make late releases > of 8.3.x not only be able to force that to occur, but also tweak the > normal code paths to maintain that minimum free space. > > The full concept as I understood it (dunno why Bruce left all these > details out of his message) went like this: Exactly. I didn't go into the implementation details to make it easer for people to see my general goals. Tom's implementation steps are the correct approach, assuming we can get agreement on the general goals. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> That's all fine and dandy, except that it presumes that you can perform > SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that > A-E aren't there until they get converted. Which is exactly the > overhead we were looking to avoid. I don't understand this comment at all. Unless you have some sort of magical wand in your back pocket that will instantaneously transform the entire database, there is going to be a period of time when you have to cope with both V3 and V4 pages. ISTM that what we should be talking about here is: (1) How are we going to do that in a way that imposes near-zero overhead once the entire database has been converted? (2) How are we going to do that in a way that is minimally invasive to the code? (3) Can we accomplish (1) and (2) while still retaining somewhat reasonable performance for V3 pages? Zdenek's initial proposal did this by replacing all of the tuple header macros with functions that were conditionalized on page version. I think we agree that's not going to work. That doesn't mean that there is no approach that can work, and we were discussing possible ways to make it work upthread until the thread got hijacked to discuss the right way of handling page expansion. Now that it seems we agree that a transaction can be used to move tuples onto new pages, I think we'd be well served to stop talking about page expansion and get back to the original topic: where and how to insert the hooks for V3 tuple handling. > (Another small issue is exactly when you convert the index entries, > should you be faced with an upgrade that requires that.) Zdenek set out his thoughts on this point upthread, no need to rehash here. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane napsal(a): "Robert Haas" <[EMAIL PROTECTED]> writes: To spell this out in more detail: Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and F. We examine the page and determine that if we convert this to a V4 page, only five tuples will fit. So we need to get rid of one of the tuples. We begin a transaction and choose F as the victim. Searching the FSM, we discover that page 456 is a V4 page with available free space. We pin and lock pages 123 and 456 just as if we were doing a heap_update. We create F', the V4 version of F, and write it onto page 456. We set xmax on the original F. We peform the corresponding index updates and commit the transaction. Time passes. Eventually F becomes dead. We reclaim the space previously used by F, and page 123 now contains only 5 tuples. This is exactly what we needed in order to convert page F to a V4 page, so we do. That's all fine and dandy, except that it presumes that you can perform SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that A-E aren't there until they get converted. Which is exactly the overhead we were looking to avoid. We want to avoid overhead on V$lastest$ tuples, but I guess small performance gap on old tuple is acceptable. The only way (which I see now) how it should work is to have multi page version processing. And old tuple will be converted when PageGetHepaTuple will be called. However, how Heikki mentioned tuple and page conversion is basic and same for all upgrade method and it should be done first. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: > To spell this out in more detail: > Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and > F. We examine the page and determine that if we convert this to a V4 > page, only five tuples will fit. So we need to get rid of one of the > tuples. We begin a transaction and choose F as the victim. Searching > the FSM, we discover that page 456 is a V4 page with available free > space. We pin and lock pages 123 and 456 just as if we were doing a > heap_update. We create F', the V4 version of F, and write it onto > page 456. We set xmax on the original F. We peform the corresponding > index updates and commit the transaction. > Time passes. Eventually F becomes dead. We reclaim the space > previously used by F, and page 123 now contains only 5 tuples. This > is exactly what we needed in order to convert page F to a V4 page, so > we do. That's all fine and dandy, except that it presumes that you can perform SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that A-E aren't there until they get converted. Which is exactly the overhead we were looking to avoid. (Another small issue is exactly when you convert the index entries, should you be faced with an upgrade that requires that.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
>>> >> Problem is how to move tuple from page to another and keep indexes in >>> >> sync. >>> >> One solution is to perform some think like "update" operation on the >>> >> tuple. >>> >> But you need exclusive lock on the page and pin counter have to be zero. >>> >> And >>> >> question is where it is safe operation. >>> > >>> > But doesn't this problem go away if you do it in a transaction? You >>> > set xmax on the old tuple, write the new tuple, and add index entries >>> > just as you would for a normal update. >>> >>> But that doesn't actually solve the overflow problem on the old page... >> >> Sure it does. You move just enough tuples that you can convert the page >> without an overflow. > > setting the xmax on a tuple doesn't "move" the tuple Nobody said it did. I think this would have been more clear if you had quoted my whole email instead of stopping in the middle: >> But doesn't this problem go away if you do it in a transaction? You >> set xmax on the old tuple, write the new tuple, and add index entries >> just as you would for a normal update. >> >> When the old tuple is no longer visible to any transaction, you nuke it. To spell this out in more detail: Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and F. We examine the page and determine that if we convert this to a V4 page, only five tuples will fit. So we need to get rid of one of the tuples. We begin a transaction and choose F as the victim. Searching the FSM, we discover that page 456 is a V4 page with available free space. We pin and lock pages 123 and 456 just as if we were doing a heap_update. We create F', the V4 version of F, and write it onto page 456. We set xmax on the original F. We peform the corresponding index updates and commit the transaction. Time passes. Eventually F becomes dead. We reclaim the space previously used by F, and page 123 now contains only 5 tuples. This is exactly what we needed in order to convert page F to a V4 page, so we do. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > On Wed, Nov 05, 2008 at 09:41:52PM +, Gregory Stark wrote: >> "Robert Haas" <[EMAIL PROTECTED]> writes: >> >> >> Problem is how to move tuple from page to another and keep indexes in >> >> sync. >> >> One solution is to perform some think like "update" operation on the >> >> tuple. >> >> But you need exclusive lock on the page and pin counter have to be zero. >> >> And >> >> question is where it is safe operation. >> > >> > But doesn't this problem go away if you do it in a transaction? You >> > set xmax on the old tuple, write the new tuple, and add index entries >> > just as you would for a normal update. >> >> But that doesn't actually solve the overflow problem on the old page... > > Sure it does. You move just enough tuples that you can convert the page > without an overflow. setting the xmax on a tuple doesn't "move" the tuple -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Wed, Nov 05, 2008 at 09:41:52PM +, Gregory Stark wrote: > "Robert Haas" <[EMAIL PROTECTED]> writes: > > >> Problem is how to move tuple from page to another and keep indexes in sync. > >> One solution is to perform some think like "update" operation on the tuple. > >> But you need exclusive lock on the page and pin counter have to be zero. > >> And > >> question is where it is safe operation. > > > > But doesn't this problem go away if you do it in a transaction? You > > set xmax on the old tuple, write the new tuple, and add index entries > > just as you would for a normal update. > > But that doesn't actually solve the overflow problem on the old page... Sure it does. You move just enough tuples that you can convert the page without an overflow. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: >> Problem is how to move tuple from page to another and keep indexes in sync. >> One solution is to perform some think like "update" operation on the tuple. >> But you need exclusive lock on the page and pin counter have to be zero. And >> question is where it is safe operation. > > But doesn't this problem go away if you do it in a transaction? You > set xmax on the old tuple, write the new tuple, and add index entries > just as you would for a normal update. But that doesn't actually solve the overflow problem on the old page... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> Problem is how to move tuple from page to another and keep indexes in sync. > One solution is to perform some think like "update" operation on the tuple. > But you need exclusive lock on the page and pin counter have to be zero. And > question is where it is safe operation. But doesn't this problem go away if you do it in a transaction? You set xmax on the old tuple, write the new tuple, and add index entries just as you would for a normal update. When the old tuple is no longer visible to any transaction, you nuke it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Zdenek Kotala <[EMAIL PROTECTED]> writes: > Martijn van Oosterhout napsal(a): >> Is this really such a big deal? You do the null-update on the last >> tuple of the page and then you do have enough room. So Phase one moves >> a few tuples to make room. Phase 2 actually converts the pages inplace. > Problem is how to move tuple from page to another and keep indexes in > sync. One solution is to perform some think like "update" operation on > the tuple. But you need exclusive lock on the page and pin counter > have to be zero. And question is where it is safe operation. Hmm. Well, it may be a nasty problem but you have to find a solution. We're not going to guarantee that no update ever expands the data ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Martijn van Oosterhout napsal(a): On Wed, Nov 05, 2008 at 03:04:42PM +0100, Zdenek Kotala wrote: Greg Stark napsal(a): It is exceptional case between V3 and V4 and only on heap, because you save in varlena. But between V4 and V5 we will lost another 4 bytes in a page header -> page header will be 28 bytes long but tuple size is same. Try to get raw free space on each page in 8.3 database and you probably see a lot of pages where free space is 0. My last experience is something about 1-2% of pages. Is this really such a big deal? You do the null-update on the last tuple of the page and then you do have enough room. So Phase one moves a few tuples to make room. Phase 2 actually converts the pages inplace. Problem is how to move tuple from page to another and keep indexes in sync. One solution is to perform some think like "update" operation on the tuple. But you need exclusive lock on the page and pin counter have to be zero. And question is where it is safe operation. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Wed, Nov 05, 2008 at 03:04:42PM +0100, Zdenek Kotala wrote: > Greg Stark napsal(a): > It is exceptional case between V3 and V4 and only on heap, because you save > in varlena. But between V4 and V5 we will lost another 4 bytes in a page > header -> page header will be 28 bytes long but tuple size is same. > > Try to get raw free space on each page in 8.3 database and you probably see > a lot of pages where free space is 0. My last experience is something about > 1-2% of pages. Is this really such a big deal? You do the null-update on the last tuple of the page and then you do have enough room. So Phase one moves a few tuples to make room. Phase 2 actually converts the pages inplace. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] [WIP] In-place upgrade
Greg Stark napsal(a): I don't think this really qualifies as "in place upgrade" since it would mean creating a whole second copy of all your data. And it's only online got read-only queries too. I think we need a way to upgrade the pages in place and deal with any overflow data as exceptional cases or else there's hardly much point in the exercise. It is exceptional case between V3 and V4 and only on heap, because you save in varlena. But between V4 and V5 we will lost another 4 bytes in a page header -> page header will be 28 bytes long but tuple size is same. Try to get raw free space on each page in 8.3 database and you probably see a lot of pages where free space is 0. My last experience is something about 1-2% of pages. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
I don't think this really qualifies as "in place upgrade" since it would mean creating a whole second copy of all your data. And it's only online got read-only queries too. I think we need a way to upgrade the pages in place and deal with any overflow data as exceptional cases or else there's hardly much point in the exercise. greg On 5 Nov 2008, at 07:32 AM, "Robert Haas" <[EMAIL PROTECTED]> wrote: An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new features. Just as an example, DBAs may be surprised to find out that large swathes of their database are still not protected by CRC checksums months or years after having upgraded to 8.4 (or even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their data is upgraded. OK, I see your point. In the absence of any old snapshots, convert-on-write allows you to forcibly upgrade the whole table by rewriting all of the tuples into new pages: UPDATE table SET col = col In the absence of page expansion, you can put logic into VACUUM to upgrade each page in place. If you have both old snapshots that you can't get rid of, and page expansion, then you have a big problem, which I guess brings us back to Heikki's point. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Tom Lane napsal(a): I concur that I don't want to see this patch adding more than the absolute unavoidable minimum of overhead for data that meets the "current" layout definition. I'm disturbed by the proposal to stick overhead into tuple header access, for example. OK. I agree that it is overhead. However the patch contains also Tuple and Page API cleanup which is general thing. All function should use HeapTuple access not HeapTupleHeader. I used function in the patch because I added multi version access, but they can be macro. The main change of page API is to add two function PageGetHeapTuple and PageGetIndexTuple. I also add function like PageItemIsDead and so on. These change are not only related to upgrade. I accepting your complains about Tuples, but I think we should have multi page version access method. The main advantage is that indexes are ready for reading without any problem. It helps mostly in TOAST chunk data access and it is necessary for retoasting. OK it will works until somebody change btree ondisk format, but now it helps. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: We've talked about this many times before, so I'm sure you know what my opinion is. Let me phrase it one more time: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format pages eventually, whether it's during VACUUM, whenever a page is read in, or by using an extra utility. And that process needs to online. Please speak up now if you disagree with that. Yes. Agree. The basic idea is to create new empty page and copy+convert tuples into new page. This new page will overwrite old one I have already code which converts heap table (excluding arrays and composite datatype). 2. It follows from point 1, that you *will* need to solve the problems with pages where the data doesn't fit on the page in new format, as well as converting TOAST data. Yes or no. It depends if we will want live with old pages forever. But I think convert all pages to the newest version is good idea. We've discussed various solutions to those problems; it's not insurmountable. For the "data doesn't fit anymore" problem, a fairly simple solution is to run a pre-upgrade utility in the old version, that reserves some free space on each page, to make sure everything fits after converting to new format. I think it will not work. you need protect also PotgreSQL to put any data extra data on a page. Which requires modification into PostgreSQL code in old branches. For TOAST, you can retoast tuples when the heap page is read in. Yes you have to retosted it which is only possible method but problem is thet you need workinig toastable index ... yeah, indexes are different story. > I'm not sure what the problem with indexes is, > but you can split pages if necessary, for example. Indexes is different story. In first step I prefer to use reindex. But in the future a prefer to extend pg_am and add ampageconvert which will point to conversion function. Maybe we can extend it now and keep this column empty. Assuming everyone agrees with point 1, could we focus on these issues? Yes, OK I'm going to cleanup code which I have and I will send it soon. Tuple conversion is already part of patch which I already send. See access/heapam/htup_03.c. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> An old page which never goes away. New page formats are introduced for a > reason -- to support new features. An old page lying around indefinitely means > some pages can't support those new features. Just as an example, DBAs may be > surprised to find out that large swathes of their database are still not > protected by CRC checksums months or years after having upgraded to 8.4 (or > even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their > data is upgraded. OK, I see your point. In the absence of any old snapshots, convert-on-write allows you to forcibly upgrade the whole table by rewriting all of the tuples into new pages: UPDATE table SET col = col In the absence of page expansion, you can put logic into VACUUM to upgrade each page in place. If you have both old snapshots that you can't get rid of, and page expansion, then you have a big problem, which I guess brings us back to Heikki's point. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Gregory Stark wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Gregory Stark wrote: "Robert Haas" <[EMAIL PROTECTED]> writes: An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new features. Just as an example, DBAs may be surprised to find out that large swathes of their database are still not protected by CRC checksums months or years after having upgraded to 8.4 (or even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their data is upgraded. Then provide a manual mechanism to convert all pages? The origin of this thread was the dispute over this claim: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format pages eventually, whether it's during VACUUM, whenever a page is read in, or by using an extra utility. And that process needs to online. Please speak up now if you disagree with that. I agree. Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> "Robert Haas" <[EMAIL PROTECTED]> writes: > >> An old page which never goes away. New page formats are introduced for a >> reason -- to support new features. An old page lying around indefinitely >> means >> some pages can't support those new features. Just as an example, DBAs may be >> surprised to find out that large swathes of their database are still not >> protected by CRC checksums months or years after having upgraded to 8.4 (or >> even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their >> data is upgraded. > > Then provide a manual mechanism to convert all pages? The origin of this thread was the dispute over this claim: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format pages eventually, whether it's during VACUUM, whenever a page is read in, or by using an extra utility. And that process needs to online. Please speak up now if you disagree with that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Gregory Stark wrote: "Robert Haas" <[EMAIL PROTECTED]> writes: An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new features. Just as an example, DBAs may be surprised to find out that large swathes of their database are still not protected by CRC checksums months or years after having upgraded to 8.4 (or even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their data is upgraded. Then provide a manual mechanism to convert all pages? Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: >>> No, that's not what I'm suggesting. My thought was that any V3 page >>> would be treated as if it were completely full, with the exception of >>> a completely empty page which can be reinitialized as a V4 page. So >>> you would never add any tuples to a V3 page, but you would need to >>> update xmax, hint bits, etc. Eventually when all the tuples were dead >>> you could reuse the page. >> >> But there's no guarantee that will ever happen. Heikki claimed you would need >> a mechanism to convert the page some day and you said you proposed a system >> where that wasn't true. > > What's the scenario you're concerned about? An old snapshot that > never goes away? An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new features. Just as an example, DBAs may be surprised to find out that large swathes of their database are still not protected by CRC checksums months or years after having upgraded to 8.4 (or even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their data is upgraded. > Can we lock the old and new pages, move the tuple to a V4 page, and > update index entries without changing xmin/xmax? Not exactly. But regardless -- the point is we need to do something. (And then the argument goes that since we *have* to do that then we needn't bother with doing anything else. At least if we do it's just an optimization over just doing the whole page right away.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
>> No, that's not what I'm suggesting. My thought was that any V3 page >> would be treated as if it were completely full, with the exception of >> a completely empty page which can be reinitialized as a V4 page. So >> you would never add any tuples to a V3 page, but you would need to >> update xmax, hint bits, etc. Eventually when all the tuples were dead >> you could reuse the page. > > But there's no guarantee that will ever happen. Heikki claimed you would need > a mechanism to convert the page some day and you said you proposed a system > where that wasn't true. What's the scenario you're concerned about? An old snapshot that never goes away? Can we lock the old and new pages, move the tuple to a V4 page, and update index entries without changing xmin/xmax? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: >>> Maybe. The difference is that I'm talking about converting tuples, >>> not pages, so "What happens when the data doesn't fit on the new >>> page?" is a meaningless question. >> >> No it's not, because as you pointed out you still need a way for the user to >> force it to happen sometime. Unless you're going to be happy with telling >> users they need to update all their tuples which would not be an online >> process. >> >> In any case it sounds like you're saying you want to allow multiple versions >> of tuples on the same page -- which a) would be much harder and b) doesn't >> solve the problem since the page still has to be converted sometime anyways. > > No, that's not what I'm suggesting. My thought was that any V3 page > would be treated as if it were completely full, with the exception of > a completely empty page which can be reinitialized as a V4 page. So > you would never add any tuples to a V3 page, but you would need to > update xmax, hint bits, etc. Eventually when all the tuples were dead > you could reuse the page. But there's no guarantee that will ever happen. Heikki claimed you would need a mechanism to convert the page some day and you said you proposed a system where that wasn't true. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
>> Maybe. The difference is that I'm talking about converting tuples, >> not pages, so "What happens when the data doesn't fit on the new >> page?" is a meaningless question. > > No it's not, because as you pointed out you still need a way for the user to > force it to happen sometime. Unless you're going to be happy with telling > users they need to update all their tuples which would not be an online > process. > > In any case it sounds like you're saying you want to allow multiple versions > of tuples on the same page -- which a) would be much harder and b) doesn't > solve the problem since the page still has to be converted sometime anyways. No, that's not what I'm suggesting. My thought was that any V3 page would be treated as if it were completely full, with the exception of a completely empty page which can be reinitialized as a V4 page. So you would never add any tuples to a V3 page, but you would need to update xmax, hint bits, etc. Eventually when all the tuples were dead you could reuse the page. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: >>> Well, I just proposed an approach that doesn't work this way, so I >>> guess I'll have to put myself in the disagree category, or anyway yet >>> to be convinced. As long as you can move individual tuples onto new >>> pages, you can eventually empty V3 pages and reinitialize them as new, >>> empty V4 pages. You can force that process along via, say, VACUUM, >> >> No, if you can force that process along via some command, whatever it is, >> then >> you're still in the category he described. > > Maybe. The difference is that I'm talking about converting tuples, > not pages, so "What happens when the data doesn't fit on the new > page?" is a meaningless question. No it's not, because as you pointed out you still need a way for the user to force it to happen sometime. Unless you're going to be happy with telling users they need to update all their tuples which would not be an online process. In any case it sounds like you're saying you want to allow multiple versions of tuples on the same page -- which a) would be much harder and b) doesn't solve the problem since the page still has to be converted sometime anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: >> We've talked about this many times before, so I'm sure you know what my >> opinion is. Let me phrase it one more time: >> >> 1. You *will* need a function to convert a page from old format to new >> format. We do want to get rid of the old format pages eventually, whether >> it's during VACUUM, whenever a page is read in, or by using an extra >> utility. And that process needs to online. Please speak up now if you >> disagree with that. > > Well, I just proposed an approach that doesn't work this way, so I > guess I'll have to put myself in the disagree category, or anyway yet > to be convinced. As long as you can move individual tuples onto new > pages, you can eventually empty V3 pages and reinitialize them as new, > empty V4 pages. You can force that process along via, say, VACUUM, No, if you can force that process along via some command, whatever it is, then you're still in the category he described. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
>> Well, I just proposed an approach that doesn't work this way, so I >> guess I'll have to put myself in the disagree category, or anyway yet >> to be convinced. As long as you can move individual tuples onto new >> pages, you can eventually empty V3 pages and reinitialize them as new, >> empty V4 pages. You can force that process along via, say, VACUUM, > > No, if you can force that process along via some command, whatever it is, then > you're still in the category he described. Maybe. The difference is that I'm talking about converting tuples, not pages, so "What happens when the data doesn't fit on the new page?" is a meaningless question. Since that seemed to be Heikki's main concern, I thought we must be talking about different things. My thought was that the code path for converting a tuple would be very similar to what heap_update does today, and large tuples would be handled via TOAST just as they are now - by converting the relation one tuple at a time, you might end up with a new relation that has either more or fewer pages than the old relation, and it really doesn't matter which. I haven't really thought through all of the other kinds of things that might need to be converted, though. That's where it would be useful for someone more experienced to weigh in on indexes, etc. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> That's sane *if* you can guarantee that only negligible overhead is > added for accessing data that is in the up-to-date format. I don't > think that will be the case if we start putting version checks into > every tuple access macro. Yes, the point is that you'll read the page as V3 or V4, whichever it is, but if it's V3, you'll convert the tuples to V4 format before you try to doing anything with them (for example by modifying ExecStoreTuple to copy any V3 tuple into a palloc'd buffer, which fits nicely into what that function already does). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: > Well, I just proposed an approach that doesn't work this way, so I > guess I'll have to put myself in the disagree category, or anyway yet > to be convinced. As long as you can move individual tuples onto new > pages, you can eventually empty V3 pages and reinitialize them as new, > empty V4 pages. You can force that process along via, say, VACUUM, > but in the meantime you can still continue to read the old pages > without being forced to change them to the new format. That's not the > only possible approach, but it's not obvious to me that it's insane. > If you think it's a non-starter, it would be good to know why. That's sane *if* you can guarantee that only negligible overhead is added for accessing data that is in the up-to-date format. I don't think that will be the case if we start putting version checks into every tuple access macro. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> We've talked about this many times before, so I'm sure you know what my > opinion is. Let me phrase it one more time: > > 1. You *will* need a function to convert a page from old format to new > format. We do want to get rid of the old format pages eventually, whether > it's during VACUUM, whenever a page is read in, or by using an extra > utility. And that process needs to online. Please speak up now if you > disagree with that. Well, I just proposed an approach that doesn't work this way, so I guess I'll have to put myself in the disagree category, or anyway yet to be convinced. As long as you can move individual tuples onto new pages, you can eventually empty V3 pages and reinitialize them as new, empty V4 pages. You can force that process along via, say, VACUUM, but in the meantime you can still continue to read the old pages without being forced to change them to the new format. That's not the only possible approach, but it's not obvious to me that it's insane. If you think it's a non-starter, it would be good to know why. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Zdenek Kotala wrote: Robert Haas napsal(a): Really, what I'd ideally like to see here is a system where the V3 code is in essence error-recovery code. Everything should be V4-only unless you detect a V3 page, and then you error out (if in-place upgrade is not enabled) or jump to the appropriate V3-aware code (if in-place upgrade is enabled). In theory, with a system like this, it seems like the overhead for V4 ought to be no more than the cost of checking the page version on each page read, which is a cheap sanity check we'd be willing to pay for anyway, and trivial in cost. OK. It was original idea to make "Convert on read" which has several problems with no easy solution. One is that new data does not fit on the page and second big problem is how to convert TOAST table data. Another problem which is general is how to convert indexes... We've talked about this many times before, so I'm sure you know what my opinion is. Let me phrase it one more time: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format pages eventually, whether it's during VACUUM, whenever a page is read in, or by using an extra utility. And that process needs to online. Please speak up now if you disagree with that. 2. It follows from point 1, that you *will* need to solve the problems with pages where the data doesn't fit on the page in new format, as well as converting TOAST data. We've discussed various solutions to those problems; it's not insurmountable. For the "data doesn't fit anymore" problem, a fairly simple solution is to run a pre-upgrade utility in the old version, that reserves some free space on each page, to make sure everything fits after converting to new format. For TOAST, you can retoast tuples when the heap page is read in. I'm not sure what the problem with indexes is, but you can split pages if necessary, for example. Assuming everyone agrees with point 1, could we focus on these issues? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> I see. But Vacuum and other internals function access heap pages directly > without ExecStoreTuple. Right. I don't think there's any getting around the fact that any function which accesses heap pages directly is going to need modification. The key is to make those modifications as non-invasive as possible. For example, in the case of vacuum, as soon as it detects that a V3 page has been read, it should call a special function whose only purpose in life is to move the data out of that V3 page and onto one or more V4 pages, and return. What you shouldn't do is try to make the regular vacuum code handle both V3 and V4 pages, because that will lead to code that may be slow and will almost certainly be complicated and difficult to maintain. I'll read through the rest of this when I have a bit more time. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Robert Haas napsal(a): OK. It was original idea to make "Convert on read" which has several problems with no easy solution. One is that new data does not fit on the page and second big problem is how to convert TOAST table data. Another problem which is general is how to convert indexes... Convert on read has minimal impact on core when latest version is processed. But problem is what happen when you need to migrate tuple form page to new one modify index and also needs convert toast value(s)... Problem is that response could be long in some query, because it invokes a lot of changes and conversion. I think in corner case it could requires converts all index when you request one record. I don't think I'm proposing convert on read, exactly. If you actually try to convert the entire page when you read it in, I think you're doomed to failure, because, as you rightly point out, there is absolutely no guarantee that the page contents in their new format will still fit into one block. I think what you want to do is convert the structures within the page one by one as you read them out of the page. The proposed refactoring of ExecStoreTuple will do exactly this, for example. I see. But Vacuum and other internals function access heap pages directly without ExecStoreTuple. however you point to one idea which I'm currently thinking about it too. There is my version: If you look into new page API it has PageGetHeapTuple. It could do the conversion job. Problem is that you don't have relation info there and you cannot convert data, but transaction information can be converted. I think about HeapTupleData structure modification. It will have pointer to transaction info t_transinfo, which will point to the page tuple for V4. For V3 PageGetHeapTuple function will allocate memory and put converted data here. ExecStoreTuple will finally convert data. Because it know about relation and It does not make sense convert data early. Who wants to convert invisible or dead data. With this approach tuple will be processed same way with V4 without any overhead (they will be small overhead with allocating and free heaptupledata in some places - mostly vacuum). Only multi version access will be driven on page basis. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> OK. It was original idea to make "Convert on read" which has several > problems with no easy solution. One is that new data does not fit on the > page and second big problem is how to convert TOAST table data. Another > problem which is general is how to convert indexes... > > Convert on read has minimal impact on core when latest version is processed. > But problem is what happen when you need to migrate tuple form page to new > one modify index and also needs convert toast value(s)... Problem is that > response could be long in some query, because it invokes a lot of changes > and conversion. I think in corner case it could requires converts all index > when you request one record. I don't think I'm proposing convert on read, exactly. If you actually try to convert the entire page when you read it in, I think you're doomed to failure, because, as you rightly point out, there is absolutely no guarantee that the page contents in their new format will still fit into one block. I think what you want to do is convert the structures within the page one by one as you read them out of the page. The proposed refactoring of ExecStoreTuple will do exactly this, for example. HEAD uses a pointer into the actual buffer for a V4 tuple that comes from an existing relation, and a pointer to a palloc'd structure for a tuple that is generated during query execution. The proposed refactoring will keep these rules, plus add a new rule that if you happen to read a V3 page, you will palloc space for a new V4 tuple that is semantically equivalent to the V3 tuple on the page, and use that pointer instead. That, it seems to me, is exactly the right balance - the PAGE is still a V3 page, but all of the tuples that the upper-level code ever sees are V4 tuples. I'm not sure how far this particular approach can be generalized. ExecStoreTuple has the advantage that it already has to deal with both direct buffer pointers and palloc'd structures, so the code doesn't need to be much more complex to handle this case as well. I think the thing to do is go through and scrutinize all of the ReadBuffer call sites and figure out an approach to each one. I haven't looked at your latest code yet, so you may have already done this, but just for example, RelationGetBufferForTuple should probably just reject any V3 pages encountered as if they were full, including updating the FSM where appropriate. I would think that it would be possible to implement that with almost zero performance impact. I'm happy to look at and discuss the problem cases with you, and hopefully others will chime in as well since my knowledge of the code is far from exhaustive. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Robert Haas napsal(a): Really, what I'd ideally like to see here is a system where the V3 code is in essence error-recovery code. Everything should be V4-only unless you detect a V3 page, and then you error out (if in-place upgrade is not enabled) or jump to the appropriate V3-aware code (if in-place upgrade is enabled). In theory, with a system like this, it seems like the overhead for V4 ought to be no more than the cost of checking the page version on each page read, which is a cheap sanity check we'd be willing to pay for anyway, and trivial in cost. OK. It was original idea to make "Convert on read" which has several problems with no easy solution. One is that new data does not fit on the page and second big problem is how to convert TOAST table data. Another problem which is general is how to convert indexes... Convert on read has minimal impact on core when latest version is processed. But problem is what happen when you need to migrate tuple form page to new one modify index and also needs convert toast value(s)... Problem is that response could be long in some query, because it invokes a lot of changes and conversion. I think in corner case it could requires converts all index when you request one record. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> We already do check the page version on read-in --- see PageHeaderIsValid. Right, but the only place this is called is in ReadBuffer_common, which doesn't seem like a suitable place to deal with the possibility of a V3 page since you don't yet know what you plan to do with it. I'm not quite sure what the right solution to that problem is... >> But I think we probably need some input from -core on this topic as well. > I concur that I don't want to see this patch adding more than the > absolute unavoidable minimum of overhead for data that meets the > "current" layout definition. I'm disturbed by the proposal to stick > overhead into tuple header access, for example. ...but it seems like we both agree that conditionalizing heap tuple header access on page version is not the right answer. Based on that, I'm going to move the "htup and bufpage API clean up" patch to "Returned with feedback" and continue reviewing the remainder of these patches. As I'm looking at this, I'm realizing another problem - there is a lot of code that looks like this: void HeapTupleSetXmax(HeapTuple tuple, TransactionId xmax) { switch(tuple->t_ver) { case 4 : tuple->t_data->t_choice.t_heap.t_xmax = xmax; break; case 3 : TPH03(tuple)->t_choice.t_heap.t_xmax = xmax; break; default: elog(PANIC, "HeapTupleSetXmax is not supported."); } } TPH03 is a macro that is casting tuple->t_data to HeapTupleHeader_03. Unless I'm missing something, that means that given an arbitrary pointer to HeapTuple, there is absolutely no guarantee that tuple->t_data->t_choice actually points to that field at all. It will if tuple->t_ver happens to be 4 OR if HeapTupleHeader and HeapTupleHeader_03 happen to agree on where t_choice is; otherwise it points to some other member of HeapTupleHeader_03, or off the end of the structure. To me that seems unacceptably fragile, because it means the compiler can't warn us that we're using a pointer inappropriately. If we truly want to be safe here then we need to create an opaque HeapTupleHeader structure that contains only those elements that HeapTupleHeader_03 and HeapTupleHeader_04 have in common, and cast BOTH of them after checking the version. That way if somone writes a function that attempts to deference a HeapTupleHeader without going through the API, it will fail to compile rather than mostly working but possibly failing on a V3 page. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
"Robert Haas" <[EMAIL PROTECTED]> writes: > Really, what I'd ideally like to see here is a system where the V3 > code is in essence error-recovery code. Everything should be V4-only > unless you detect a V3 page, and then you error out (if in-place > upgrade is not enabled) or jump to the appropriate V3-aware code (if > in-place upgrade is enabled). In theory, with a system like this, it > seems like the overhead for V4 ought to be no more than the cost of > checking the page version on each page read, which is a cheap sanity > check we'd be willing to pay for anyway, and trivial in cost. We already do check the page version on read-in --- see PageHeaderIsValid. > But I think we probably need some input from -core on this topic as well. I concur that I don't want to see this patch adding more than the absolute unavoidable minimum of overhead for data that meets the "current" layout definition. I'm disturbed by the proposal to stick overhead into tuple header access, for example. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
> You need to apply also two other patches: > which are located here: > http://wiki.postgresql.org/wiki/CommitFestInProgress#Upgrade-in-place_and_related_issues > I moved one related patch from another category here to correct place. Just to confirm, which two? > http://git.postgresql.org/?p=~davidfetter/upgrade_in_place/.git;a=snapshot;h=c72bafada59ed278ffac59657c913bc375f77808;sf=tgz > > It should contains every think including yesterdays improvements (delete, > insert, update works - inser/update only on table without index). Wow, sounds like great improvements. I understand your difficulties in keeping up with HEAD, but I hope we can figure out some solution, because right now I have a diff (that I can't apply) and a tarball (that I can't diff) and that is not ideal for reviewing. > Yeah, it is most difficult part :-) find correct names for it. I think that > each version of structure should have version suffix including lastone. And > of cource the last one we should have a general name without suffix - see > example: > > typedef struct PageHeaderData_04 { ...} PageHeaderData_04 > typedef struct PageHeaderData_03 { ...} PageHeaderData_03 > typedef PageHeaderData_04 PageHeaderData > > This allows you exactly specify version on places where you need it and keep > general name where version is not relevant. That doesn't make sense to me. If PageHeaderData and PageHeaderData_04 are the same type, how do you decide which one to use in any particular place in the code? > How suffix should looks it another question. I prefer to have 04 not only 4. > What's about PageHeaderData_V04? I prefer "V" as a delimiter rather than "_" because that makes it more clear that the number which follows is a version number, but I think "_V" is overkill. However, I don't really want to argue the point; I'm just throwing in my $0.02 and I am sure others will have their own views as well. > By the way what YMMV means? "Your Mileage May Vary." http://www.urbandictionary.com/define.php?term=YMMV >> I am pretty skeptical of the idea that all of the HeapTuple* functions >> can just be conditionalized on the page version and everything will >> Just Work. It seems like that is too low a level to be worrying about >> such things. Even if it happens to work for the changes between V3 >> and V4, what happens when V5 or V6 is changed in such a way that the >> answer to HeapTupleIsWhatever is neither "Yes" nor "No", but rather >> "Maybe" or "Seven"? The performance hit also sounds painful. I don't >> have a better idea right now though... > > OK. Currently it works (or I hope that it works). If somebody in a future > invent some special change, i think in most (maybe all) cases there will be > possible mapping. > > The speed is key point. When I check it last time I go 1% performance drop > in fresh database. I think 1% is good price for in-place online upgrade. I think that's arguable and something that needs to be more broadly discussed. I wouldn't be keen to pay a 1% performance drop for this feature, because it's not a feature I really need. Sure, in-place upgrade would be nice to have, but for me, dump and reload isn't a huge problem. It's a lot better than the 5% number you quoted previously, but I'm not sure whether it is good enough, I would feel more comfortable if the feature could be completely disabled via compile-time defines. Then you could build the system either with or without in-place upgrade, according to your needs. But I don't think that's very practical with HeapTuple* as functions. You could conditionalize away the switch, but the function call overhead would remain. To get rid of that, you'd need some enormous, fragile hack that I don't even want to contemplate. Really, what I'd ideally like to see here is a system where the V3 code is in essence error-recovery code. Everything should be V4-only unless you detect a V3 page, and then you error out (if in-place upgrade is not enabled) or jump to the appropriate V3-aware code (if in-place upgrade is enabled). In theory, with a system like this, it seems like the overhead for V4 ought to be no more than the cost of checking the page version on each page read, which is a cheap sanity check we'd be willing to pay for anyway, and trivial in cost. But I think we probably need some input from -core on this topic as well. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
Big thanks for review. Robert Haas napsal(a): I tried to apply this patch to CVS HEAD and it blew up all over the place. It doesn't seem to be intended to apply against CVS HEAD; for example, I don't have backend/access/heap/htup.c at all, so can't apply changes to that file. You need to apply also two other patches: which are located here: http://wiki.postgresql.org/wiki/CommitFestInProgress#Upgrade-in-place_and_related_issues I moved one related patch from another category here to correct place. The problem is that it is difficult to keep it in sync with head, because they change a lot of things. It the reason why I put all also into GIT repository, but ... I was able to clone the GIT repository with the following command... git clone http://git.postgresql.org/git/~davidfetter/upgrade_in_place/.git ...but now I'm confused, because I don't see the changes from the diff reflected in the resulting tree. As you can see, I am not a git wizard. Any help would be appreciated. I'm GIT newbie I use mercurial for development and I manually applied changes into GIT. I asked David Fetter with help how to get back the correct clone. In meantime you can download a tarball. http://git.postgresql.org/?p=~davidfetter/upgrade_in_place/.git;a=snapshot;h=c72bafada59ed278ffac59657c913bc375f77808;sf=tgz It should contains every think including yesterdays improvements (delete, insert, update works - inser/update only on table without index). Here are a few initial thoughts based mostly on reading the diff: In the minor nit department, I don't really like the idea of PageHeaderData_04, SizeOfPageHeaderData04, PageLayoutIsValid_04, etc. I think the latest version should just be PageHeaderData and SizeOfPageHeaderData, and previous versions should be, e.g. PageHeaderDataV3. It looks to me like this would cut a few hunks out of this and maybe make it a bit easier to understand what is going on. At any rate, if we are going to stick with an explicit version number in both versions, it should be marked in a consistent way, not _04 sometimes and just 04 other times. My suggestion is e.g. "V4" but YMMV. Yeah, it is most difficult part :-) find correct names for it. I think that each version of structure should have version suffix including lastone. And of cource the last one we should have a general name without suffix - see example: typedef struct PageHeaderData_04 { ...} PageHeaderData_04 typedef struct PageHeaderData_03 { ...} PageHeaderData_03 typedef PageHeaderData_04 PageHeaderData This allows you exactly specify version on places where you need it and keep general name where version is not relevant. How suffix should looks it another question. I prefer to have 04 not only 4. What's about PageHeaderData_V04? By the way what YMMV means? The changes to nodeIndexscan.c and nodeSeqscan.c are worrisome to me. It looks like the added code is (nearly?) identical in both places, so probably it needs to be refactored to avoid code duplication. I'm also a bit skeptical about the idea of doing the tuple conversion here. Why here rather than ExecStoreTuple()? If you decide to convert the tuple, you can palloc the new one, pfree the old one if ShouldFree is set, and reset shouldFree to true. Good point. I thought about it as a one variant. And if I look it close now it is really much better place. It should fix a problem why REINDEX does not work. I will move it. I am pretty skeptical of the idea that all of the HeapTuple* functions can just be conditionalized on the page version and everything will Just Work. It seems like that is too low a level to be worrying about such things. Even if it happens to work for the changes between V3 and V4, what happens when V5 or V6 is changed in such a way that the answer to HeapTupleIsWhatever is neither "Yes" nor "No", but rather "Maybe" or "Seven"? The performance hit also sounds painful. I don't have a better idea right now though... OK. Currently it works (or I hope that it works). If somebody in a future invent some special change, i think in most (maybe all) cases there will be possible mapping. The speed is key point. When I check it last time I go 1% performance drop in fresh database. I think 1% is good price for in-place online upgrade. I think it's going to be absolutely imperative to begin vacuuming away old V3 pages as quickly as possible after the upgrade. If you go with the approach of converting the tuple in, or just before, ExecStoreTuple, then you're going to introduce a lot of overhead when working with V3 pages. I think that's fine. You should plan to do your in-place upgrade at 1AM on Christmas morning (or whenever your load hits rock bottom...) and immediately start converting the database, starting with your most important and smallest tables. In fact, I would look whenever possible for ways to make the V4 case a fast-path and just accept that the system is going to labor a bit when dealing with V3 stuff. A
Re: [HACKERS] [WIP] In-place upgrade
I tried to apply this patch to CVS HEAD and it blew up all over the place. It doesn't seem to be intended to apply against CVS HEAD; for example, I don't have backend/access/heap/htup.c at all, so can't apply changes to that file. I was able to clone the GIT repository with the following command... git clone http://git.postgresql.org/git/~davidfetter/upgrade_in_place/.git ...but now I'm confused, because I don't see the changes from the diff reflected in the resulting tree. As you can see, I am not a git wizard. Any help would be appreciated. Here are a few initial thoughts based mostly on reading the diff: In the minor nit department, I don't really like the idea of PageHeaderData_04, SizeOfPageHeaderData04, PageLayoutIsValid_04, etc. I think the latest version should just be PageHeaderData and SizeOfPageHeaderData, and previous versions should be, e.g. PageHeaderDataV3. It looks to me like this would cut a few hunks out of this and maybe make it a bit easier to understand what is going on. At any rate, if we are going to stick with an explicit version number in both versions, it should be marked in a consistent way, not _04 sometimes and just 04 other times. My suggestion is e.g. "V4" but YMMV. The changes to nodeIndexscan.c and nodeSeqscan.c are worrisome to me. It looks like the added code is (nearly?) identical in both places, so probably it needs to be refactored to avoid code duplication. I'm also a bit skeptical about the idea of doing the tuple conversion here. Why here rather than ExecStoreTuple()? If you decide to convert the tuple, you can palloc the new one, pfree the old one if ShouldFree is set, and reset shouldFree to true. I am pretty skeptical of the idea that all of the HeapTuple* functions can just be conditionalized on the page version and everything will Just Work. It seems like that is too low a level to be worrying about such things. Even if it happens to work for the changes between V3 and V4, what happens when V5 or V6 is changed in such a way that the answer to HeapTupleIsWhatever is neither "Yes" nor "No", but rather "Maybe" or "Seven"? The performance hit also sounds painful. I don't have a better idea right now though... I think it's going to be absolutely imperative to begin vacuuming away old V3 pages as quickly as possible after the upgrade. If you go with the approach of converting the tuple in, or just before, ExecStoreTuple, then you're going to introduce a lot of overhead when working with V3 pages. I think that's fine. You should plan to do your in-place upgrade at 1AM on Christmas morning (or whenever your load hits rock bottom...) and immediately start converting the database, starting with your most important and smallest tables. In fact, I would look whenever possible for ways to make the V4 case a fast-path and just accept that the system is going to labor a bit when dealing with V3 stuff. Any overhead you introduce when dealing with V3 pages can go away; any V4 overhead is permanent and therefore much more difficult to accept. That's about all I have for now... if you can give me some pointers on working with this git repository, or provide a complete patch that applies cleanly to CVS HEAD, I will try to look at this in more detail. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers