Re: [HACKERS] [WIP] In-place upgrade

2008-11-27 Thread Zdenek Kotala

Robert Haas napsal(a):

1. htup and bufpage API clean up
2. HeapTuple version extension + code cleanup
3. In-place online upgrade
4. Extending pg_class info + more flexible TOAST chunk size

big thanks for your review. I think #1 is still partially valid, because it
contains general cleanups, but part of it is not necessary now. #2, #3 and
#4 you can move to return with feedback section.


OK, when can you submit a new version of #1 with the parts that are
still valid, updated to CVS HEAD, etc?



It does not have priority now. I'm working on space reservation first.

Thanks Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Zdenek Kotala

Alvaro Herrera napsal(a):

Robert Haas escribió:


With respect to #4, I know that Alvaro submitted a draft patch, but
I'm not clear on whether that needs to be reviewed, because:

- I'm not sure whether it's close enough to being finished for a
review to be a good use of time.
- I'm not sure how much you and Heikki have already reviewed it.
- I'm not sure whether this patch buys us anything by itself.


I finished that patch, but I didn't submit it because in later
discussion it turned out (at least as I read it) that it's considered to
be unnecessary.



From pg_upgrade perspective, it is something what we will need do anyway. 
Because TOAST_MAX_CHUNK_SIZE will be different in 8.5 (if you commit CRC). Then 
we will need the patch for 8.5. It is not necessary for 8.3->8.4 upgrade because 
 TOAST_MAX_CHUNK_SIZE is same. And make this change into toast table now will 
add unnecessary complexity.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Alvaro Herrera
Robert Haas escribió:

> With respect to #4, I know that Alvaro submitted a draft patch, but
> I'm not clear on whether that needs to be reviewed, because:
> 
> - I'm not sure whether it's close enough to being finished for a
> review to be a good use of time.
> - I'm not sure how much you and Heikki have already reviewed it.
> - I'm not sure whether this patch buys us anything by itself.

I finished that patch, but I didn't submit it because in later
discussion it turned out (at least as I read it) that it's considered to
be unnecessary.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Robert Haas
>> 1. htup and bufpage API clean up
>> 2. HeapTuple version extension + code cleanup
>> 3. In-place online upgrade
>> 4. Extending pg_class info + more flexible TOAST chunk size
> big thanks for your review. I think #1 is still partially valid, because it
> contains general cleanups, but part of it is not necessary now. #2, #3 and
> #4 you can move to return with feedback section.

OK, when can you submit a new version of #1 with the parts that are
still valid, updated to CVS HEAD, etc?

Thanks,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Zdenek Kotala

Robert,

big thanks for your review. I think #1 is still partially valid, because it 
contains general cleanups, but part of it is not necessary now. #2, #3 and #4 
you can move to return with feedback section.


Thanks Zdenek

Robert Haas napsal(a):

Zdenek -

I am a bit murky on where we stand with upgrade-in-place in terms of
reviewing.  Initially, you had submitted four patches for this
commitfest:

1. htup and bufpage API clean up
2. HeapTuple version extension + code cleanup
3. In-place online upgrade
4. Extending pg_class info + more flexible TOAST chunk size

I think that it was decided that replacing the heap tuple access
macros with function calls was not acceptable, so I have moved patches
#1 and #2 to the "Returned with feedback" section.  I thought that
perhaps the third patch could be salvaged, but the consensus seemed to
be to go in a new direction, so I'm thinking that one should probably
be moved to "Returned with feedback" as well.  However, I'm not clear
on whether you will be submitting something else instead and whether
that thing should be considered material for this commitfest.  Can you
let me know how you are thinking about this?

With respect to #4, I know that Alvaro submitted a draft patch, but
I'm not clear on whether that needs to be reviewed, because:

- I'm not sure whether it's close enough to being finished for a
review to be a good use of time.
- I'm not sure how much you and Heikki have already reviewed it.
- I'm not sure whether this patch buys us anything by itself.

Thoughts?

...Robert



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-25 Thread Robert Haas
Zdenek -

I am a bit murky on where we stand with upgrade-in-place in terms of
reviewing.  Initially, you had submitted four patches for this
commitfest:

1. htup and bufpage API clean up
2. HeapTuple version extension + code cleanup
3. In-place online upgrade
4. Extending pg_class info + more flexible TOAST chunk size

I think that it was decided that replacing the heap tuple access
macros with function calls was not acceptable, so I have moved patches
#1 and #2 to the "Returned with feedback" section.  I thought that
perhaps the third patch could be salvaged, but the consensus seemed to
be to go in a new direction, so I'm thinking that one should probably
be moved to "Returned with feedback" as well.  However, I'm not clear
on whether you will be submitting something else instead and whether
that thing should be considered material for this commitfest.  Can you
let me know how you are thinking about this?

With respect to #4, I know that Alvaro submitted a draft patch, but
I'm not clear on whether that needs to be reviewed, because:

- I'm not sure whether it's close enough to being finished for a
review to be a good use of time.
- I'm not sure how much you and Heikki have already reviewed it.
- I'm not sure whether this patch buys us anything by itself.

Thoughts?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Jeff


On Nov 9, 2008, at 11:09 PM, Joshua D. Drake wrote:

I think it's time for people to stop asking for the moon and realize
that if we don't constrain this feature pretty darn tightly, we will
have *nothing at all* for 8.4.  Again.


Gotta go with Tom on this one. The idea that we would somehow upgrade
from 8.1 to 8.4 is silly. Yes it will be unfortunate for those running
8.1 but keeping track of multi version like that is going to be  
entirely

too expensive.



I agree as well. If we can get the at least the base level stuff in  
8.4 so that 8.5 and beyond is in-place upgradable then that is a huge  
win.   If we could support 8.2 or 8.3 or 6.5 :) that would be nice,  
but I think dealing with everything retroactively will cause our heads  
to explode and a mountain of awful code to arise.   If we say "8.4 and  
beyond will be upgradable" we can toss everything in we think we'll  
need to deal with it and not worry about the retroactive case (unless  
someone has a really clever(tm) idea!)


This can't be an original problem to solve, too many other databases  
do it as well.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Joshua D. Drake
On Mon, 2008-11-10 at 09:14 -0500, Matthew T. O'Connor wrote:
> Tom Lane wrote:
> > Decibel! <[EMAIL PROTECTED]> writes:
> >   
> >> I think that's pretty seriously un-desirable. It's not at all  
> >> uncommon for databases to stick around for a very long time and then  
> >> jump ahead many versions. I don't think we want to tell people they  
> >> can't do that.
> >> 
> >
> > Of course they can do that --- they just have to do it one version at a
> > time.
> 
> Also, people may be less likely to stick with an old outdated version 
> for years and years if the upgrade process is easier.

Kind of OT but, I don't agree with this. There will always be those who
are willing to just upgrade because they can but the smart play is to
upgrade because you need to. If anything in place upgrades is just going
to remove the last real business and technical barrier to using
postgresql for enterprises.

Joshua D. Drake

> 
> 
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Matthew T. O'Connor

Tom Lane wrote:

Decibel! <[EMAIL PROTECTED]> writes:
  
I think that's pretty seriously un-desirable. It's not at all  
uncommon for databases to stick around for a very long time and then  
jump ahead many versions. I don't think we want to tell people they  
can't do that.



Of course they can do that --- they just have to do it one version at a
time.


Also, people may be less likely to stick with an old outdated version 
for years and years if the upgrade process is easier.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Zdenek Kotala

Decibel! napsal(a):

Unless I'm mistaken, there are only two cases we care about for 
additional space: per-page and per-tuple. 


Yes. And maybe special space indexes could be extended, but it is covered in 
per-page setting.


Those requirements could also 
vary for different types of pg_class objects. What we need is an API 
that allows an administrator to tell the database to start setting this 
space aside. One possibility:


We need API or mechanism how in-place upgrade will setup it. It must be done by 
in-place upgrade.




relkind: Essentially, heap vs toast, though I suppose it's possible we 
might need this for sequences.


Sequences are converted during catalog upgrade.



Once we have an API, we need to get users to make use of it. I'm 
thinking add something like the following to the release notes:


"To upgrade from a prior version to 8.4, you will need to run some of 
the following commands, depending on what version you are currently using:





It is too complicated. At first it depends also on architecture and it is 
possible to easily compute by in-place upgrade script. What you need is only run 
script which do all setting for you. You can obtain it from next version (IIRC 
Oracle do it this way) or we can add this configuration script into previous 
version during a minor update.




OTOH, we might not want to go mucking around with changing the catalog 
for older versions (I'm not even sure if we can). So perhaps it would be 
better to store this information in a separate table, or maybe a 
separate file. That might be best anyway; we generally wouldn't need 
this information, so it would be nice if it wasn't bloating pg_class all 
the time.


It is why I selected relopt for storing this configuration parameter, which is 
supported from 8.2 and upgrade from 8.1->8.2 works fine.


Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Joshua D. Drake
On Sun, 2008-11-09 at 20:02 -0500, Tom Lane wrote:
> Decibel! <[EMAIL PROTECTED]> writes:
> > I think that's pretty seriously un-desirable. It's not at all  
> > uncommon for databases to stick around for a very long time and then  
> > jump ahead many versions. I don't think we want to tell people they  
> > can't do that.
> 
> Of course they can do that --- they just have to do it one version at a
> time.
> 
> I think it's time for people to stop asking for the moon and realize
> that if we don't constrain this feature pretty darn tightly, we will
> have *nothing at all* for 8.4.  Again.

Gotta go with Tom on this one. The idea that we would somehow upgrade
from 8.1 to 8.4 is silly. Yes it will be unfortunate for those running
8.1 but keeping track of multi version like that is going to be entirely
too expensive.

At some point it won't matter but right now it really does.

Joshua D. Drake

> 
>   regards, tom lane
> 
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Tom Lane
Decibel! <[EMAIL PROTECTED]> writes:
> I think that's pretty seriously un-desirable. It's not at all  
> uncommon for databases to stick around for a very long time and then  
> jump ahead many versions. I don't think we want to tell people they  
> can't do that.

Of course they can do that --- they just have to do it one version at a
time.

I think it's time for people to stop asking for the moon and realize
that if we don't constrain this feature pretty darn tightly, we will
have *nothing at all* for 8.4.  Again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Decibel!

On Nov 6, 2008, at 1:31 PM, Bruce Momjian wrote:

3. What about multi-release upgrades?  Say someone wants to upgrade
from 8.3 to 8.6.  8.6 only knows how to read pages that are
8.5-and-a-half or better, 8.5 only knows how to read pages that are
8.4-and-a-half or better, and 8.4 only knows how to read pages that
are 8.3-and-a-half or better.  So the user will have to upgrade to
8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6.


Yes.



I think that's pretty seriously un-desirable. It's not at all  
uncommon for databases to stick around for a very long time and then  
jump ahead many versions. I don't think we want to tell people they  
can't do that.


More importantly, I think we're barking up the wrong tree by putting  
migration knowledge into old versions. All that the old versions need  
to do is guarantee a specific amount of free space per page. We  
should provide a mechanism to tell a cluster what that free space  
requirement is, and not hard-code it into the backend.


Unless I'm mistaken, there are only two cases we care about for  
additional space: per-page and per-tuple. Those requirements could  
also vary for different types of pg_class objects. What we need is an  
API that allows an administrator to tell the database to start  
setting this space aside. One possibility:


pg_min_free_space( version, relkind, bytes_per_page, bytes_per_tuple );
pg_min_free_space_index( version, indexkind, bytes_per_page,  
bytes_per_tuple );


version: This would be provided as a safety mechanism. You would have  
to provide the major version that matches what the backend is  
running. See below for an example.


relkind: Essentially, heap vs toast, though I suppose it's possible  
we might need this for sequences.


indexkind: Because we support different types of indexes, I think we  
need to handle them differently than heap/toast. If we wanted, we  
could have a single function that demands that indexkind is NULL if  
relkind != 'index'.


bytes_per_(page|tuple): obvious. :)


Once we have an API, we need to get users to make use of it. I'm  
thinking add something like the following to the release notes:


"To upgrade from a prior version to 8.4, you will need to run some of  
the following commands, depending on what version you are currently  
using:


For version 8.3:
SELECT pg_min_free_space( '8.3', 'heap', 4, 12 );
SELECT pg_min_free_space( '8.3', 'toast', 4, 12 );

For version 8.2:
SELECT pg_min_free_space( '8.2', 'heap', 14, 12 );
SELECT pg_min_free_space( '8.2', 'toast', 14, 12 );
SELECT pg_min_free_space_index( '8.2', 'b-tree', 4, 4);"

(Note I'm just pulling numbers out of thin air in this example.)

As you can see, we pass in the version number to ensure that if  
someone accidentally cut and pastes the wrong stuff they know what  
they did wrong immediately.


One downside to this scheme is that it doesn't provide a mechanism to  
ensure that all required minimum free space requirements were passed  
in. Perhaps we want a function that takes an array of complex types  
and forces you to supply information for all known storage  
mechanisms. Another possibility would be to pass in some kind of  
binary format that contains a checksum.


Even if we do come up with a pretty fool-proof way to tell the old  
version what free space it needs to set aside, I think we should  
still have a mechanism for the new version to know exactly what the  
old version has set aside, and if it's actually been accomplished or  
not. One option that comes to mind is to add min_free_space_per_page  
and min_free_space_per_tuple to pg_class. Normally these fields would  
be NULL; the old version would only set them once it had verified  
that all pages in a given relation met those requirements (presumably  
via vacuum). The new version would check all these values on startup  
to ensure they made sense.


OTOH, we might not want to go mucking around with changing the  
catalog for older versions (I'm not even sure if we can). So perhaps  
it would be better to store this information in a separate table, or  
maybe a separate file. That might be best anyway; we generally  
wouldn't need this information, so it would be nice if it wasn't  
bloating pg_class all the time.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Tom Lane napsal(a):
>> * Add a "format serial number" column to pg_class, and probably also
>> pg_database.  Rather like the frozenxid columns, this would have the
>> semantics that all pages in a relation or database are known to have at
>> least the specified format number.

> I prefer to have latest processed block. InvalidBlockNumber would mean
> nothing is processed and 0 means everything is already reserved. I
> suggest to process it backward. It should prevent to check new
> extended block which will be already correctly setup.

That seems bizarre and not very helpful.  In the first place, if we're
driving it off vacuum there would be no opportunity for recording a
half-processed state value.  In the second place, this formulation fails
to provide any evidence of *what* processing you completed or didn't
complete.  In a multi-step upgrade sequence I think it's going to be a
mess if we aren't explicit about that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Zdenek Kotala

Tom Lane napsal(a):


I think we can have a notion of pre-upgrade maintenance, but it would
have to be integrated into normal operations.  For instance, if
conversion to 8.4 requires extra free space, we'd make late releases
of 8.3.x not only be able to force that to occur, but also tweak the
normal code paths to maintain that minimum free space.


OK. I will focus on this. I guess this approach revival my hook patch:

http://archives.postgresql.org/pgsql-hackers/2008-04/msg00990.php


The full concept as I understood it (dunno why Bruce left all these
details out of his message) went like this:

* Add a "format serial number" column to pg_class, and probably also
pg_database.  Rather like the frozenxid columns, this would have the
semantics that all pages in a relation or database are known to have at
least the specified format number.

* There would actually be two serial numbers per release, at least for
releases where pre-update prep work is involved --- for instance,
between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is
8.3 but known ready to update to 8.4 (eg, enough free space available).
Minor releases of 8.3 that appear with or subsequent to 8.4 release
understand the "half" format number and how to upgrade to it.


I prefer to have latest processed block. InvalidBlockNumber would mean nothing 
is processed and 0 means everything is already reserved. I suggest to process it 
backward. It should prevent to check new extended block which will be already 
correctly setup.



* VACUUM would be empowered, in the same way as it handles frozenxid
maintenance, to update any less-than-the-latest-version pages and then
fix the pg_class and pg_database entries.

* We could mechanically enforce that you not update until the database
is ready for it by checking pg_database.datformatversion during
postmaster startup.


I'm don't understand you here? Do you mean on old server version or new server 
version. Or who will perform this check? Do not remember that we currently do 
catalog conversion by dump and import which lost all extended information.


Thanks Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Zdenek Kotala

Tom Lane napsal(a):

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Adding catalog columns seems rather complicated, and not back-patchable. 


Agreed, we'd not be able to make them retroactively appear in 8.3.

I imagined that you would have just a single cluster-wide variable, a 
GUC perhaps, indicating how much space should be reserved by 
updates/inserts. Then you'd have an additional program, perhaps a new 
contrib module, that sets the variable to the right value for the 
version you're upgrading, and scans through all tables, moving tuples so 
that every page has enough free space for the upgrade. After that's 
done, it'd set a flag in the data directory indicating that the cluster 
is ready for upgrade.


Possibly that could work.  The main thing is to have a way of being sure
that the prep work has been completed on every page of the database.
The disadvantage of not having catalog support is that you'd have to
complete the entire scan operation in one go to be sure you'd hit
everything.


I prefer to have catalog support. Special on very long tables it helps when 
somebody stop preupgrade script for some reason.



Another thought here is that I don't think we are yet committed to any
changes that require extra space between 8.3 and 8.4, are we?  The
proposed addition of CRC words could be put off to 8.5, for instance.
So it seems at least within reach to not require any preparatory steps
for 8.3-to-8.4, and put the infrastructure in place now to support such
steps in future go-rounds.


Yeah. We still have V4 without any storage modification (exclude HASH index). 
However I think if reloptions will be use for storing information about reserved 
space then It shouldn't be a problem. But we need to be sure if it is possible.


Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Tom Lane wrote:

I think we can have a notion of pre-upgrade maintenance, but it would
have to be integrated into normal operations.  For instance, if
conversion to 8.4 requires extra free space, we'd make late releases
of 8.3.x not only be able to force that to occur, but also tweak the
normal code paths to maintain that minimum free space.


Agreed, the backend needs to be modified to reserve the space.


The full concept as I understood it (dunno why Bruce left all these
details out of his message) went like this:

* Add a "format serial number" column to pg_class, and probably also
pg_database.  Rather like the frozenxid columns, this would have the
semantics that all pages in a relation or database are known to have at
least the specified format number.

* There would actually be two serial numbers per release, at least for
releases where pre-update prep work is involved --- for instance,
between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is
8.3 but known ready to update to 8.4 (eg, enough free space available).
Minor releases of 8.3 that appear with or subsequent to 8.4 release
understand the "half" format number and how to upgrade to it.

* VACUUM would be empowered, in the same way as it handles frozenxid
maintenance, to update any less-than-the-latest-version pages and then
fix the pg_class and pg_database entries.

* We could mechanically enforce that you not update until the database
is ready for it by checking pg_database.datformatversion during
postmaster startup.


Adding catalog columns seems rather complicated, and not back-patchable. 
Not backpatchable means that we'd need to be sure now that the format 
serial numbers are enough for the upcoming 8.4-8.5 upgrade.


Reloptions is suitable for keeping amount of reserver space. And it can be back 
ported into 8.3 and 8.2. And of course there is no problem to convert 8.1->8.2.


For backported branch would be better to combine internal modification - 
preserve space and e.g. store procedure which check all relations.


In the 8.4 and newer pg_class could be extended for new attributes.

I imagined that you would have just a single cluster-wide variable, a 
GUC perhaps, indicating how much space should be reserved by 
updates/inserts. 


You sometimes need different reserved size for different type of relation. For 
example on 32bit x86 you don't need reserve space for heap but you need do it 
for indexes (between v3->v4). Better is to use reloptions and pre-upgrade 
procedure sets this information correctly.


Then you'd have an additional program, perhaps a new 
contrib module, that sets the variable to the right value for the 
version you're upgrading, and scans through all tables, moving tuples so 
that every page has enough free space for the upgrade. After that's 
done, it'd set a flag in the data directory indicating that the cluster 
is ready for upgrade.


I prefer to have this information in pg_class. It is accessible by SQL commands. 
pg_class should also contains information about last checked page to prevent 
repeatable check on very large tables.


The tool could run concurrently with normal activity, so you could just 
let it run for as long as it takes.


Agree.

Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Greg Smith

On Thu, 6 Nov 2008, Tom Lane wrote:


-Is it worth considering making CRCs an optional compile-time feature, and
that (for now at least) you couldn't get them and the in-place upgrade at
the same time?


Hmm ... might be better than not offering them in 8.4 at all, but the
thing is that then you are asking packagers to decide for their
customers which is more important.  And I'd bet you anything you want
that in-place upgrade would be their choice.


I was thinking of something similar to how --enable-thread-safety has been 
rolled out.  It could be hanging around there and available to those who 
want it in their build, even though it might not be available by default 
in a typical mainstream distribution.  Since there's already a GUC for 
toggling the checksums in the code, internally it could work like 
debug_assertions where you only get that option if support was compiled in 
appropriately.  Just a thought I wanted to throw out there, if it makes 
eventual upgrades from 8.4 more complicated it may not be worth even 
considering.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes:
> On Thu, 6 Nov 2008, Tom Lane wrote:
>> Another thought here is that I don't think we are yet committed to any
>> changes that require extra space between 8.3 and 8.4, are we?  The
>> proposed addition of CRC words could be put off to 8.5, for instance.

> I was just staring at that code as you wrote this thinking about the same 
> thing. ...

> -Is it worth considering making CRCs an optional compile-time feature, and 
> that (for now at least) you couldn't get them and the in-place upgrade at 
> the same time?

Hmm ... might be better than not offering them in 8.4 at all, but the
thing is that then you are asking packagers to decide for their
customers which is more important.  And I'd bet you anything you want
that in-place upgrade would be their choice.

Also, having such an option would create extra complexity for 8.4-to-8.5
upgrades.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
> The idea that you're going to get in-place upgrade all the way back to 8.2
> without taking the database down for a even little bit to run such a utility
> is hard to pull off, and it's impressive that Zdenek and everyone else
> involved has gotten so close to doing it.

I think we should at least wait to see what the next version of his
patch looks like before making any final decisions.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Greg Smith

On Thu, 6 Nov 2008, Tom Lane wrote:


Another thought here is that I don't think we are yet committed to any
changes that require extra space between 8.3 and 8.4, are we?  The
proposed addition of CRC words could be put off to 8.5, for instance.


I was just staring at that code as you wrote this thinking about the same 
thing.  CRCs are a great feature I'd really like to see.  On the other 
hand, announcing that 8.4 features in-place upgrades for 8.3 databases, 
and that the project has laid the infrastructure such that future releases 
will also upgrade in-place, would IMHO be the biggest positive 
announcement of the new release by a large margin.  At least then new 
large (>1TB) installs could kick off on either the stable 8.3 or 8.4 
knowing they'd never be forced to deal with dump/reload, whereas right now 
there is no reasonable solution for them that involves PostgreSQL (I just 
crossed 3TB on a system last month and I'm not looking forward to its 
future upgrades).


Two questions come to mind here:

-If you reduce the page layout upgrade problem to "convert from V4 to V5 
adding support for CRCs", is there a worthwhile simpler path to handling 
that without dragging the full complexity of the older page layout changes 
in?


-Is it worth considering making CRCs an optional compile-time feature, and 
that (for now at least) you couldn't get them and the in-place upgrade at 
the same time?


Stepping back for a second, the idea that in-place upgrade is only 
worthwhile if it yields zero downtime isn't necessarily the case.  Even 
having an offline-only upgrade tool to handle the more complicated 
situations where tuples have to be squeezed onto another page would still 
be a major improvement over the current situation.  The thing that you 
have to recognize here is that dump/reload is extremely slow because of 
bottlenecks in the COPY process.  That makes for a large amount of 
downtime--many hours isn't unusual.


If older version upgrade downtime was reduced to how long it takes to run 
a "must scan every page and fiddle with it if full" tool, that would still 
be a giant improvement over the current state of things.  If Zdenek's 
figures that only a small percentages of pages will need such adjustment 
holds up, that should take only some factor longer than a sequential scan 
of the whole database.  That's not instant, but it's at least an order of 
magnitude faster than a dump/reload on a big system.


The idea that you're going to get in-place upgrade all the way back to 8.2 
without taking the database down for a even little bit to run such a 
utility is hard to pull off, and it's impressive that Zdenek and everyone 
else involved has gotten so close to doing it.  I personally am on the 
fence as to whether it's worth paying even the 1% penalty for that 
implementation all the time just to get in-place upgrades.  If an offline 
utility with reasonable (scan instead of dump/reload) downtime and closer 
to zero overhead when finished was available instead, that might be a more 
reasonable trade-off to make for handling older releases.  There are so 
many bottlenecks in the older versions that you're less likely to find a 
database too large to dump and reload there anyway.  It would also be the 
case that improvements to that offline utility could continue after 8.4 
proper was completely frozen.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Adding catalog columns seems rather complicated, and not back-patchable. 

Agreed, we'd not be able to make them retroactively appear in 8.3.

> I imagined that you would have just a single cluster-wide variable, a 
> GUC perhaps, indicating how much space should be reserved by 
> updates/inserts. Then you'd have an additional program, perhaps a new 
> contrib module, that sets the variable to the right value for the 
> version you're upgrading, and scans through all tables, moving tuples so 
> that every page has enough free space for the upgrade. After that's 
> done, it'd set a flag in the data directory indicating that the cluster 
> is ready for upgrade.

Possibly that could work.  The main thing is to have a way of being sure
that the prep work has been completed on every page of the database.
The disadvantage of not having catalog support is that you'd have to
complete the entire scan operation in one go to be sure you'd hit
everything.

Another thought here is that I don't think we are yet committed to any
changes that require extra space between 8.3 and 8.4, are we?  The
proposed addition of CRC words could be put off to 8.5, for instance.
So it seems at least within reach to not require any preparatory steps
for 8.3-to-8.4, and put the infrastructure in place now to support such
steps in future go-rounds.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
"Robert Haas" <[EMAIL PROTECTED]> writes:
> That means, in essence, that the earliest possible version that could
> be in-place upgraded would be an 8.4 system - we are giving up
> completely on in-place upgrade to 8.4 from any earlier version (which
> personally I thought was the whole point of this feature in the first
> place).

Quite honestly, given where we are in the schedule and the lack of
consensus about how to do this, I think we would be well advised to
decide right now to forget about supporting in-place upgrade to 8.4,
and instead work on allowing in-place upgrades from 8.4 onwards.
Shooting for a general-purpose does-it-all scheme that can handle
old versions that had no thought of supporting such updates is likely
to ensure that we end up with *NOTHING*.

What Bruce is proposing, I think, is that we intentionally restrict what
we want to accomplish to something that might be within reach now and
also sustainable over the long term.  Planning to update any version to
any other version is *not* sustainable --- we haven't got the resources
nor the interest to create large amounts of conversion code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> I envision a similar system where we have utilities to guarantee all
> pages have enough free space, and all pages are the current version,
> before allowing an upgrade-in-place to the next version.  Such a
> consistent API will make the job for users easier and our job simpler,
> and with upgrade-in-place, where we have limited time and resources to
> code this for each release, simplicity is important.

An external utility doesn't seem like the right way to approach it.
For example, given the need to ensure X amount of free space in each
page, the only way to guarantee that would be to shut down the database
while you run the utility over all the pages --- otherwise somebody
might fill some page up again.  And that completely defeats the purpose,
which is to have minimal downtime during upgrade.

I think we can have a notion of pre-upgrade maintenance, but it would
have to be integrated into normal operations.  For instance, if
conversion to 8.4 requires extra free space, we'd make late releases
of 8.3.x not only be able to force that to occur, but also tweak the
normal code paths to maintain that minimum free space.

The full concept as I understood it (dunno why Bruce left all these
details out of his message) went like this:

* Add a "format serial number" column to pg_class, and probably also
pg_database.  Rather like the frozenxid columns, this would have the
semantics that all pages in a relation or database are known to have at
least the specified format number.

* There would actually be two serial numbers per release, at least for
releases where pre-update prep work is involved --- for instance,
between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is
8.3 but known ready to update to 8.4 (eg, enough free space available).
Minor releases of 8.3 that appear with or subsequent to 8.4 release
understand the "half" format number and how to upgrade to it.

* VACUUM would be empowered, in the same way as it handles frozenxid
maintenance, to update any less-than-the-latest-version pages and then
fix the pg_class and pg_database entries.

* We could mechanically enforce that you not update until the database
is ready for it by checking pg_database.datformatversion during
postmaster startup.

So the update process would require users to install a suitably late
version of 8.3, vacuum everything over a suitable maintenance window,
then install 8.4, then perhaps vacuum everything again if they want to
try to push page update work into specific maintenance windows.  But
the DB is up and functioning the whole time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
> And almost guarantee that the job will never be completed, or tested
> fully.  Remember that in-place upgrades would be pretty painless so
> doing multiple major upgrades should not be a difficult requiremnt, or
> they can dump/reload their data to skip it.

Regardless of what design is chosen, there's no requirement that we
support in-place upgrade from 8.3 to 8.6, or even 8.4 to 8.6, in one
shot.  But the design that you and Tom are proposing pretty much
ensures that it will be impossible.

But that's certainly the least important reason not to do it this way.
 I think this comment from Heikki is pretty revealing:

> Adding catalog columns seems rather complicated, and not back-patchable. Not 
> backpatchable means that we'd need to be sure now
> that the format serial numbers are enough for the upcoming 8.4-8.5 upgrade.

That means, in essence, that the earliest possible version that could
be in-place upgraded would be an 8.4 system - we are giving up
completely on in-place upgrade to 8.4 from any earlier version (which
personally I thought was the whole point of this feature in the first
place).  And we'll only be able to in-place upgrade to 8.5 if the
unproven assumption that these catalog changes are sufficient turns
out to be true, or if whatever other changes turn out to be necessary
are back-patchable.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Bruce Momjian
Robert Haas wrote:
> > That's all fine and dandy, except that it presumes that you can perform
> > SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that
> > A-E aren't there until they get converted.  Which is exactly the
> > overhead we were looking to avoid.
> 
> I don't understand this comment at all.  Unless you have some sort of
> magical wand in your back pocket that will instantaneously transform
> the entire database, there is going to be a period of time when you
> have to cope with both V3 and V4 pages.  ISTM that what we should be
> talking about here is:
> 
> (1) How are we going to do that in a way that imposes near-zero
> overhead once the entire database has been converted?
> (2) How are we going to do that in a way that is minimally invasive to the 
> code?
> (3) Can we accomplish (1) and (2) while still retaining somewhat
> reasonable performance for V3 pages?
> 
> Zdenek's initial proposal did this by replacing all of the tuple
> header macros with functions that were conditionalized on page
> version.  I think we agree that's not going to work.  That doesn't
> mean that there is no approach that can work, and we were discussing
> possible ways to make it work upthread until the thread got hijacked
> to discuss the right way of handling page expansion.  Now that it
> seems we agree that a transaction can be used to move tuples onto new
> pages, I think we'd be well served to stop talking about page
> expansion and get back to the original topic: where and how to insert
> the hooks for V3 tuple handling.

I think the above is a good summary.  For me, the problem with any
approach that has information about prior-version block formats in the
main code path is code complexity, and secondarily performance.

I know there is concern that converting all blocks on read-in might
expand the page beyond 8k in size.  One idea Heikki had was to require
some tool must be run on minor releases before a major upgrade to
guarantee there is enough free space to convert the block to the current
format on read-in, which would localize the information about prior
block formats.  We could release the tool in minor branches around the
time as a major release.  Also consider that there are very few releases
that expand the page size.

For these reasons, the expand-the-page-beyond-8k problem should not be
dictating what approach we take for upgrade-in-place because there are
workarounds for the problem, and the problem is rare.  I would like us
to again focus on converting the pages to the current version format on
read-in, and perhaps a tool to convert all old pages to the new format.

FYI, we are also going to need the ability to convert all pages to the
current format for multi-release upgrades.  For example, if you did
upgrade-in-place from 8.2 to 8.3, you are going to need to update all
pages to the 8.3 format before doing upgrade-in-place to 8.4;  perhaps
vacuum can do something like this on a per-table basis, and we can
record that status a pg_class column.

Also, consider that when we did PITR, we required commands before and
after the tar so that there was a consistent API for PITR, and later had
to add capabilities to those functions, but the user API didn't change.

I envision a similar system where we have utilities to guarantee all
pages have enough free space, and all pages are the current version,
before allowing an upgrade-in-place to the next version.  Such a
consistent API will make the job for users easier and our job simpler,
and with upgrade-in-place, where we have limited time and resources to
code this for each release, simplicity is important.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Heikki Linnakangas

Tom Lane wrote:

I think we can have a notion of pre-upgrade maintenance, but it would
have to be integrated into normal operations.  For instance, if
conversion to 8.4 requires extra free space, we'd make late releases
of 8.3.x not only be able to force that to occur, but also tweak the
normal code paths to maintain that minimum free space.


Agreed, the backend needs to be modified to reserve the space.


The full concept as I understood it (dunno why Bruce left all these
details out of his message) went like this:

* Add a "format serial number" column to pg_class, and probably also
pg_database.  Rather like the frozenxid columns, this would have the
semantics that all pages in a relation or database are known to have at
least the specified format number.

* There would actually be two serial numbers per release, at least for
releases where pre-update prep work is involved --- for instance,
between 8.3 and 8.4 there'd be an "8.3-and-a-half" format which is
8.3 but known ready to update to 8.4 (eg, enough free space available).
Minor releases of 8.3 that appear with or subsequent to 8.4 release
understand the "half" format number and how to upgrade to it.

* VACUUM would be empowered, in the same way as it handles frozenxid
maintenance, to update any less-than-the-latest-version pages and then
fix the pg_class and pg_database entries.

* We could mechanically enforce that you not update until the database
is ready for it by checking pg_database.datformatversion during
postmaster startup.


Adding catalog columns seems rather complicated, and not back-patchable. 
Not backpatchable means that we'd need to be sure now that the format 
serial numbers are enough for the upcoming 8.4-8.5 upgrade.


I imagined that you would have just a single cluster-wide variable, a 
GUC perhaps, indicating how much space should be reserved by 
updates/inserts. Then you'd have an additional program, perhaps a new 
contrib module, that sets the variable to the right value for the 
version you're upgrading, and scans through all tables, moving tuples so 
that every page has enough free space for the upgrade. After that's 
done, it'd set a flag in the data directory indicating that the cluster 
is ready for upgrade.


The tool could run concurrently with normal activity, so you could just 
let it run for as long as it takes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Bruce Momjian
Robert Haas wrote:
> The second point could probably be addressed with a GUC but the first
> one certainly can't.
> 
> 3. What about multi-release upgrades?  Say someone wants to upgrade
> from 8.3 to 8.6.  8.6 only knows how to read pages that are
> 8.5-and-a-half or better, 8.5 only knows how to read pages that are
> 8.4-and-a-half or better, and 8.4 only knows how to read pages that
> are 8.3-and-a-half or better.  So the user will have to upgrade to
> 8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6.

Yes.

> It seems to me that if there is any way to put all of the logic to
> handle old page versions in the new code that would be much better,
> especially if it's an optional feature that can be compiled in or not.
>  Then when it's time to upgrade from 8.3 to 8.6 you could do:
> 
> ./configure --with-upgrade-83 --with-upgrade-84 --with-upgrade85
> 
> but if you don't need the code to handle old page versions you can:
> 
> ./configure --without-upgrade85
> 
> Admittedly, this requires making the new code capable of rearranging
> pages to create free space when necessary, and to be able to continue
> to execute queries while doing it, but ways of doing this have been
> proposed.  The only uncertainty is as to whether the performance and
> code complexity can be kept manageable, but I don't believe that
> question has been explored to the point where we should be ready to
> declare defeat.

And almost guarantee that the job will never be completed, or tested
fully.  Remember that in-place upgrades would be pretty painless so
doing multiple major upgrades should not be a difficult requiremnt, or
they can dump/reload their data to skip it.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
> An external utility doesn't seem like the right way to approach it.
> For example, given the need to ensure X amount of free space in each
> page, the only way to guarantee that would be to shut down the database
> while you run the utility over all the pages --- otherwise somebody
> might fill some page up again.  And that completely defeats the purpose,
> which is to have minimal downtime during upgrade.

Agreed.

> I think we can have a notion of pre-upgrade maintenance, but it would
> have to be integrated into normal operations.  For instance, if
> conversion to 8.4 requires extra free space, we'd make late releases
> of 8.3.x not only be able to force that to occur, but also tweak the
> normal code paths to maintain that minimum free space.

1. This seems to fly in the face of the sort of thing we've
traditionally back-patched.  The code to make pages ready for upgrade
to the next major release will not necessarily be straightforward (in
fact it probably isn't, otherwise we wouldn't have insisted on a
two-stage conversion process), which turns a seemingly safe minor
upgrade into a potentially dangerous operation.

2. Just because I want to upgrade to 8.3.47 and get the latest bug
fixes does not mean that I have any intention of upgrading to 8.4, and
yet you've rearranged all of my pages to have useless free space in
them (possibly at considerable and unexpected I/O cost for at least as
long as the conversion is running).

The second point could probably be addressed with a GUC but the first
one certainly can't.

3. What about multi-release upgrades?  Say someone wants to upgrade
from 8.3 to 8.6.  8.6 only knows how to read pages that are
8.5-and-a-half or better, 8.5 only knows how to read pages that are
8.4-and-a-half or better, and 8.4 only knows how to read pages that
are 8.3-and-a-half or better.  So the user will have to upgrade to
8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6.

It seems to me that if there is any way to put all of the logic to
handle old page versions in the new code that would be much better,
especially if it's an optional feature that can be compiled in or not.
 Then when it's time to upgrade from 8.3 to 8.6 you could do:

./configure --with-upgrade-83 --with-upgrade-84 --with-upgrade85

but if you don't need the code to handle old page versions you can:

./configure --without-upgrade85

Admittedly, this requires making the new code capable of rearranging
pages to create free space when necessary, and to be able to continue
to execute queries while doing it, but ways of doing this have been
proposed.  The only uncertainty is as to whether the performance and
code complexity can be kept manageable, but I don't believe that
question has been explored to the point where we should be ready to
declare defeat.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > I envision a similar system where we have utilities to guarantee all
> > pages have enough free space, and all pages are the current version,
> > before allowing an upgrade-in-place to the next version.  Such a
> > consistent API will make the job for users easier and our job simpler,
> > and with upgrade-in-place, where we have limited time and resources to
> > code this for each release, simplicity is important.
> 
> An external utility doesn't seem like the right way to approach it.
> For example, given the need to ensure X amount of free space in each
> page, the only way to guarantee that would be to shut down the database
> while you run the utility over all the pages --- otherwise somebody
> might fill some page up again.  And that completely defeats the purpose,
> which is to have minimal downtime during upgrade.
> 
> I think we can have a notion of pre-upgrade maintenance, but it would
> have to be integrated into normal operations.  For instance, if
> conversion to 8.4 requires extra free space, we'd make late releases
> of 8.3.x not only be able to force that to occur, but also tweak the
> normal code paths to maintain that minimum free space.
> 
> The full concept as I understood it (dunno why Bruce left all these
> details out of his message) went like this:

Exactly.  I didn't go into the implementation details to make it easer
for people to see my general goals.  Tom's implementation steps are the
correct approach, assuming we can get agreement on the general goals.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
> That's all fine and dandy, except that it presumes that you can perform
> SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that
> A-E aren't there until they get converted.  Which is exactly the
> overhead we were looking to avoid.

I don't understand this comment at all.  Unless you have some sort of
magical wand in your back pocket that will instantaneously transform
the entire database, there is going to be a period of time when you
have to cope with both V3 and V4 pages.  ISTM that what we should be
talking about here is:

(1) How are we going to do that in a way that imposes near-zero
overhead once the entire database has been converted?
(2) How are we going to do that in a way that is minimally invasive to the code?
(3) Can we accomplish (1) and (2) while still retaining somewhat
reasonable performance for V3 pages?

Zdenek's initial proposal did this by replacing all of the tuple
header macros with functions that were conditionalized on page
version.  I think we agree that's not going to work.  That doesn't
mean that there is no approach that can work, and we were discussing
possible ways to make it work upthread until the thread got hijacked
to discuss the right way of handling page expansion.  Now that it
seems we agree that a transaction can be used to move tuples onto new
pages, I think we'd be well served to stop talking about page
expansion and get back to the original topic: where and how to insert
the hooks for V3 tuple handling.

> (Another small issue is exactly when you convert the index entries,
> should you be faced with an upgrade that requires that.)

Zdenek set out his thoughts on this point upthread, no need to rehash here.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Zdenek Kotala

Tom Lane napsal(a):

"Robert Haas" <[EMAIL PROTECTED]> writes:

To spell this out in more detail:



Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and
F.  We examine the page and determine that if we convert this to a V4
page, only five tuples will fit.  So we need to get rid of one of the
tuples.  We begin a transaction and choose F as the victim.  Searching
the FSM, we discover that page 456 is a V4 page with available free
space.  We pin and lock pages 123 and 456 just as if we were doing a
heap_update.  We create F', the V4 version of F, and write it onto
page 456.  We set xmax on the original F.  We peform the corresponding
index updates and commit the transaction.



Time passes.  Eventually F becomes dead.  We reclaim the space
previously used by F, and page 123 now contains only 5 tuples.  This
is exactly what we needed in order to convert page F to a V4 page, so
we do.


That's all fine and dandy, except that it presumes that you can perform
SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that
A-E aren't there until they get converted.  Which is exactly the
overhead we were looking to avoid.


We want to avoid overhead on V$lastest$ tuples, but I guess small performance 
gap on old tuple is acceptable. The only way (which I see now) how it should 
work is to have multi page version processing. And old tuple will be converted 
when PageGetHepaTuple will be called.


However, how Heikki mentioned tuple and page conversion is basic and same for 
all upgrade method and it should be done first.


Zdenek




--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
"Robert Haas" <[EMAIL PROTECTED]> writes:
> To spell this out in more detail:

> Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and
> F.  We examine the page and determine that if we convert this to a V4
> page, only five tuples will fit.  So we need to get rid of one of the
> tuples.  We begin a transaction and choose F as the victim.  Searching
> the FSM, we discover that page 456 is a V4 page with available free
> space.  We pin and lock pages 123 and 456 just as if we were doing a
> heap_update.  We create F', the V4 version of F, and write it onto
> page 456.  We set xmax on the original F.  We peform the corresponding
> index updates and commit the transaction.

> Time passes.  Eventually F becomes dead.  We reclaim the space
> previously used by F, and page 123 now contains only 5 tuples.  This
> is exactly what we needed in order to convert page F to a V4 page, so
> we do.

That's all fine and dandy, except that it presumes that you can perform
SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that
A-E aren't there until they get converted.  Which is exactly the
overhead we were looking to avoid.

(Another small issue is exactly when you convert the index entries,
should you be faced with an upgrade that requires that.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Robert Haas
>>> >> Problem is how to move tuple from page to another and keep indexes in 
>>> >> sync.
>>> >> One solution is to perform some think like "update" operation on the 
>>> >> tuple.
>>> >> But you need exclusive lock on the page and pin counter have to be zero. 
>>> >> And
>>> >> question is where it is safe operation.
>>> >
>>> > But doesn't this problem go away if you do it in a transaction?  You
>>> > set xmax on the old tuple, write the new tuple, and add index entries
>>> > just as you would for a normal update.
>>>
>>> But that doesn't actually solve the overflow problem on the old page...
>>
>> Sure it does. You move just enough tuples that you can convert the page
>> without an overflow.
>
> setting the xmax on a tuple doesn't "move" the tuple

Nobody said it did.  I think this would have been more clear if you
had quoted my whole email instead of stopping in the middle:

>> But doesn't this problem go away if you do it in a transaction?  You
>> set xmax on the old tuple, write the new tuple, and add index entries
>> just as you would for a normal update.
>>
>> When the old tuple is no longer visible to any transaction, you nuke it.

To spell this out in more detail:

Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and
F.  We examine the page and determine that if we convert this to a V4
page, only five tuples will fit.  So we need to get rid of one of the
tuples.  We begin a transaction and choose F as the victim.  Searching
the FSM, we discover that page 456 is a V4 page with available free
space.  We pin and lock pages 123 and 456 just as if we were doing a
heap_update.  We create F', the V4 version of F, and write it onto
page 456.  We set xmax on the original F.  We peform the corresponding
index updates and commit the transaction.

Time passes.  Eventually F becomes dead.  We reclaim the space
previously used by F, and page 123 now contains only 5 tuples.  This
is exactly what we needed in order to convert page F to a V4 page, so
we do.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Gregory Stark
Martijn van Oosterhout <[EMAIL PROTECTED]> writes:

> On Wed, Nov 05, 2008 at 09:41:52PM +, Gregory Stark wrote:
>> "Robert Haas" <[EMAIL PROTECTED]> writes:
>> 
>> >> Problem is how to move tuple from page to another and keep indexes in 
>> >> sync.
>> >> One solution is to perform some think like "update" operation on the 
>> >> tuple.
>> >> But you need exclusive lock on the page and pin counter have to be zero. 
>> >> And
>> >> question is where it is safe operation.
>> >
>> > But doesn't this problem go away if you do it in a transaction?  You
>> > set xmax on the old tuple, write the new tuple, and add index entries
>> > just as you would for a normal update.
>> 
>> But that doesn't actually solve the overflow problem on the old page...
>
> Sure it does. You move just enough tuples that you can convert the page
> without an overflow.

setting the xmax on a tuple doesn't "move" the tuple


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Martijn van Oosterhout
On Wed, Nov 05, 2008 at 09:41:52PM +, Gregory Stark wrote:
> "Robert Haas" <[EMAIL PROTECTED]> writes:
> 
> >> Problem is how to move tuple from page to another and keep indexes in sync.
> >> One solution is to perform some think like "update" operation on the tuple.
> >> But you need exclusive lock on the page and pin counter have to be zero. 
> >> And
> >> question is where it is safe operation.
> >
> > But doesn't this problem go away if you do it in a transaction?  You
> > set xmax on the old tuple, write the new tuple, and add index entries
> > just as you would for a normal update.
> 
> But that doesn't actually solve the overflow problem on the old page...

Sure it does. You move just enough tuples that you can convert the page
without an overflow.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Gregory Stark
"Robert Haas" <[EMAIL PROTECTED]> writes:

>> Problem is how to move tuple from page to another and keep indexes in sync.
>> One solution is to perform some think like "update" operation on the tuple.
>> But you need exclusive lock on the page and pin counter have to be zero. And
>> question is where it is safe operation.
>
> But doesn't this problem go away if you do it in a transaction?  You
> set xmax on the old tuple, write the new tuple, and add index entries
> just as you would for a normal update.

But that doesn't actually solve the overflow problem on the old page...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Robert Haas
> Problem is how to move tuple from page to another and keep indexes in sync.
> One solution is to perform some think like "update" operation on the tuple.
> But you need exclusive lock on the page and pin counter have to be zero. And
> question is where it is safe operation.

But doesn't this problem go away if you do it in a transaction?  You
set xmax on the old tuple, write the new tuple, and add index entries
just as you would for a normal update.

When the old tuple is no longer visible to any transaction, you nuke it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Martijn van Oosterhout napsal(a):
>> Is this really such a big deal? You do the null-update on the last
>> tuple of the page and then you do have enough room. So Phase one moves
>> a few tuples to make room. Phase 2 actually converts the pages inplace.

> Problem is how to move tuple from page to another and keep indexes in
> sync. One solution is to perform some think like "update" operation on
> the tuple. But you need exclusive lock on the page and pin counter
> have to be zero. And question is where it is safe operation.

Hmm.  Well, it may be a nasty problem but you have to find a solution.
We're not going to guarantee that no update ever expands the data ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala

Martijn van Oosterhout napsal(a):

On Wed, Nov 05, 2008 at 03:04:42PM +0100, Zdenek Kotala wrote:

Greg Stark napsal(a):
It is exceptional case between V3 and V4 and only on heap, because you save 
in varlena. But between V4 and V5 we will lost another 4 bytes in a page 
header -> page header will be 28 bytes long but tuple size is same.


Try to get raw free space on each page in 8.3 database and you probably see 
a lot of pages where free space is 0. My last experience is something about 
1-2% of pages.


Is this really such a big deal? You do the null-update on the last
tuple of the page and then you do have enough room. So Phase one moves
a few tuples to make room. Phase 2 actually converts the pages inplace.


Problem is how to move tuple from page to another and keep indexes in sync. One 
solution is to perform some think like "update" operation on the tuple. But you 
need exclusive lock on the page and pin counter have to be zero. And question is 
where it is safe operation.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Martijn van Oosterhout
On Wed, Nov 05, 2008 at 03:04:42PM +0100, Zdenek Kotala wrote:
> Greg Stark napsal(a):
> It is exceptional case between V3 and V4 and only on heap, because you save 
> in varlena. But between V4 and V5 we will lost another 4 bytes in a page 
> header -> page header will be 28 bytes long but tuple size is same.
> 
> Try to get raw free space on each page in 8.3 database and you probably see 
> a lot of pages where free space is 0. My last experience is something about 
> 1-2% of pages.

Is this really such a big deal? You do the null-update on the last
tuple of the page and then you do have enough room. So Phase one moves
a few tuples to make room. Phase 2 actually converts the pages inplace.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala

Greg Stark napsal(a):
I don't think this really qualifies as "in place upgrade" since it would 
mean creating a whole second copy of all your data. And it's only online 
got read-only queries too.


I think we need a way to upgrade the pages in place and deal with any 
overflow data as exceptional cases or else there's hardly much point in 
the exercise.


It is exceptional case between V3 and V4 and only on heap, because you save in 
varlena. But between V4 and V5 we will lost another 4 bytes in a page header -> 
page header will be 28 bytes long but tuple size is same.


Try to get raw free space on each page in 8.3 database and you probably see a 
lot of pages where free space is 0. My last experience is something about 1-2% 
of pages.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Greg Stark
I don't think this really qualifies as "in place upgrade" since it  
would mean creating a whole second copy of all your data. And it's  
only online got read-only queries too.


I think we need a way to upgrade the pages in place and deal with any  
overflow data as exceptional cases or else there's hardly much point  
in the exercise.


greg

On 5 Nov 2008, at 07:32 AM, "Robert Haas" <[EMAIL PROTECTED]> wrote:

An old page which never goes away. New page formats are introduced  
for a
reason -- to support new features. An old page lying around  
indefinitely means
some pages can't support those new features. Just as an example,  
DBAs may be
surprised to find out that large swathes of their database are  
still not
protected by CRC checksums months or years after having upgraded to  
8.4 (or
even 8.5 or 8.6 or ...). They would certainly want a way to ensure  
all their

data is upgraded.


OK, I see your point.  In the absence of any old snapshots,
convert-on-write allows you to forcibly upgrade the whole table by
rewriting all of the tuples into new pages:

UPDATE table SET col = col

In the absence of page expansion, you can put logic into VACUUM to
upgrade each page in place.

If you have both old snapshots that you can't get rid of, and page
expansion, then you have a big problem, which I guess brings us back
to Heikki's point.

...Robert


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala

Tom Lane napsal(a):



I concur that I don't want to see this patch adding more than the
absolute unavoidable minimum of overhead for data that meets the
"current" layout definition.  I'm disturbed by the proposal to stick
overhead into tuple header access, for example.


OK. I agree that it is overhead. However the patch contains also Tuple and Page 
API cleanup which is general thing. All function should use HeapTuple access not 
HeapTupleHeader. I used function in the patch because I added multi version 
access, but they can be macro.


The main change of page API is to add two function PageGetHeapTuple and 
PageGetIndexTuple. I also add function like PageItemIsDead and so on. These 
change are not only related to upgrade.


I accepting your complains about Tuples, but I think we should have multi page 
version access method. The main advantage is that indexes are ready for reading 
without any problem. It helps mostly in TOAST chunk data access and it is 
necessary for retoasting. OK it will works until somebody change btree ondisk 
format, but now it helps.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:




We've talked about this many times before, so I'm sure you know what my 
opinion is. Let me phrase it one more time:


1. You *will* need a function to convert a page from old format to new 
format. We do want to get rid of the old format pages eventually, 
whether it's during VACUUM, whenever a page is read in, or by using an 
extra utility. And that process needs to online. Please speak up now if 
you disagree with that.


Yes. Agree. The basic idea is to create new empty page and copy+convert tuples 
into new page. This new page will overwrite old one I have already code which 
converts heap table (excluding arrays and composite datatype).


2. It follows from point 1, that you *will* need to solve the problems 
with pages where the data doesn't fit on the page in new format, as well 
as converting TOAST data.


Yes or no. It depends if we will want live with old pages forever. But I think 
convert all pages to the newest version is good idea.


We've discussed various solutions to those problems; it's not 
insurmountable. For the "data doesn't fit anymore" problem, a fairly 
simple solution is to run a pre-upgrade utility in the old version, that 
reserves some free space on each page, to make sure everything fits 
after converting to new format. 


I think it will not work. you need protect also PotgreSQL to put any data extra 
data on a page. Which requires modification into PostgreSQL code in old branches.



For TOAST, you can retoast tuples when 
the heap page is read in. 


Yes you have to retosted it which is only possible method but problem is thet 
you need workinig toastable index ... yeah, indexes are different story.


> I'm not sure what the problem with indexes is,
> but you can split pages if necessary, for example.

Indexes is different story. In first step I prefer to use reindex. But in the 
future a prefer to extend pg_am and add ampageconvert which will point to 
conversion function. Maybe we can extend it now and keep this column empty.



Assuming everyone agrees with point 1, could we focus on these issues?


Yes, OK I'm going to cleanup code which I have and I will send it soon. Tuple 
conversion is already part of patch which I already send. See 
access/heapam/htup_03.c.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Robert Haas
> An old page which never goes away. New page formats are introduced for a
> reason -- to support new features. An old page lying around indefinitely means
> some pages can't support those new features. Just as an example, DBAs may be
> surprised to find out that large swathes of their database are still not
> protected by CRC checksums months or years after having upgraded to 8.4 (or
> even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their
> data is upgraded.

OK, I see your point.  In the absence of any old snapshots,
convert-on-write allows you to forcibly upgrade the whole table by
rewriting all of the tuples into new pages:

UPDATE table SET col = col

In the absence of page expansion, you can put logic into VACUUM to
upgrade each page in place.

If you have both old snapshots that you can't get rid of, and page
expansion, then you have a big problem, which I guess brings us back
to Heikki's point.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Joshua D. Drake

Gregory Stark wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:


Gregory Stark wrote:

"Robert Haas" <[EMAIL PROTECTED]> writes:
An old page which never goes away. New page formats are introduced for a
reason -- to support new features. An old page lying around indefinitely means
some pages can't support those new features. Just as an example, DBAs may be
surprised to find out that large swathes of their database are still not
protected by CRC checksums months or years after having upgraded to 8.4 (or
even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their
data is upgraded.

Then provide a manual mechanism to convert all pages?


The origin of this thread was the dispute over this claim:

1. You *will* need a function to convert a page from old format to new
format. We do want to get rid of the old format pages eventually, whether
it's during VACUUM, whenever a page is read in, or by using an extra
utility. And that process needs to online. Please speak up now if you
disagree with that.



I agree.

Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>> "Robert Haas" <[EMAIL PROTECTED]> writes:
>
>> An old page which never goes away. New page formats are introduced for a
>> reason -- to support new features. An old page lying around indefinitely 
>> means
>> some pages can't support those new features. Just as an example, DBAs may be
>> surprised to find out that large swathes of their database are still not
>> protected by CRC checksums months or years after having upgraded to 8.4 (or
>> even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their
>> data is upgraded.
>
> Then provide a manual mechanism to convert all pages?

The origin of this thread was the dispute over this claim:

1. You *will* need a function to convert a page from old format to new
format. We do want to get rid of the old format pages eventually, whether
it's during VACUUM, whenever a page is read in, or by using an extra
utility. And that process needs to online. Please speak up now if you
disagree with that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Joshua D. Drake

Gregory Stark wrote:

"Robert Haas" <[EMAIL PROTECTED]> writes:



An old page which never goes away. New page formats are introduced for a
reason -- to support new features. An old page lying around indefinitely means
some pages can't support those new features. Just as an example, DBAs may be
surprised to find out that large swathes of their database are still not
protected by CRC checksums months or years after having upgraded to 8.4 (or
even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their
data is upgraded.


Then provide a manual mechanism to convert all pages?

Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
"Robert Haas" <[EMAIL PROTECTED]> writes:

>>> No, that's not what I'm suggesting.  My thought was that any V3 page
>>> would be treated as if it were completely full, with the exception of
>>> a completely empty page which can be reinitialized as a V4 page.  So
>>> you would never add any tuples to a V3 page, but you would need to
>>> update xmax, hint bits, etc.  Eventually when all the tuples were dead
>>> you could reuse the page.
>>
>> But there's no guarantee that will ever happen. Heikki claimed you would need
>> a mechanism to convert the page some day and you said you proposed a system
>> where that wasn't true.
>
> What's the scenario you're concerned about?  An old snapshot that
> never goes away?

An old page which never goes away. New page formats are introduced for a
reason -- to support new features. An old page lying around indefinitely means
some pages can't support those new features. Just as an example, DBAs may be
surprised to find out that large swathes of their database are still not
protected by CRC checksums months or years after having upgraded to 8.4 (or
even 8.5 or 8.6 or ...). They would certainly want a way to ensure all their
data is upgraded.

> Can we lock the old and new pages, move the tuple to a V4 page, and
> update index entries without changing xmin/xmax?

Not exactly. But regardless -- the point is we need to do something.

(And then the argument goes that since we *have* to do that then we needn't
bother with doing anything else. At least if we do it's just an optimization
over just doing the whole page right away.)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
>> No, that's not what I'm suggesting.  My thought was that any V3 page
>> would be treated as if it were completely full, with the exception of
>> a completely empty page which can be reinitialized as a V4 page.  So
>> you would never add any tuples to a V3 page, but you would need to
>> update xmax, hint bits, etc.  Eventually when all the tuples were dead
>> you could reuse the page.
>
> But there's no guarantee that will ever happen. Heikki claimed you would need
> a mechanism to convert the page some day and you said you proposed a system
> where that wasn't true.

What's the scenario you're concerned about?  An old snapshot that
never goes away?

Can we lock the old and new pages, move the tuple to a V4 page, and
update index entries without changing xmin/xmax?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
"Robert Haas" <[EMAIL PROTECTED]> writes:

>>> Maybe.  The difference is that I'm talking about converting tuples,
>>> not pages, so "What happens when the data doesn't fit on the new
>>> page?" is a meaningless question.
>>
>> No it's not, because as you pointed out you still need a way for the user to
>> force it to happen sometime. Unless you're going to be happy with telling
>> users they need to update all their tuples which would not be an online
>> process.
>>
>> In any case it sounds like you're saying you want to allow multiple versions
>> of tuples on the same page -- which a) would be much harder and b) doesn't
>> solve the problem since the page still has to be converted sometime anyways.
>
> No, that's not what I'm suggesting.  My thought was that any V3 page
> would be treated as if it were completely full, with the exception of
> a completely empty page which can be reinitialized as a V4 page.  So
> you would never add any tuples to a V3 page, but you would need to
> update xmax, hint bits, etc.  Eventually when all the tuples were dead
> you could reuse the page.

But there's no guarantee that will ever happen. Heikki claimed you would need
a mechanism to convert the page some day and you said you proposed a system
where that wasn't true.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
>> Maybe.  The difference is that I'm talking about converting tuples,
>> not pages, so "What happens when the data doesn't fit on the new
>> page?" is a meaningless question.
>
> No it's not, because as you pointed out you still need a way for the user to
> force it to happen sometime. Unless you're going to be happy with telling
> users they need to update all their tuples which would not be an online
> process.
>
> In any case it sounds like you're saying you want to allow multiple versions
> of tuples on the same page -- which a) would be much harder and b) doesn't
> solve the problem since the page still has to be converted sometime anyways.

No, that's not what I'm suggesting.  My thought was that any V3 page
would be treated as if it were completely full, with the exception of
a completely empty page which can be reinitialized as a V4 page.  So
you would never add any tuples to a V3 page, but you would need to
update xmax, hint bits, etc.  Eventually when all the tuples were dead
you could reuse the page.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
"Robert Haas" <[EMAIL PROTECTED]> writes:

>>> Well, I just proposed an approach that doesn't work this way, so I
>>> guess I'll have to put myself in the disagree category, or anyway yet
>>> to be convinced.  As long as you can move individual tuples onto new
>>> pages, you can eventually empty V3 pages and reinitialize them as new,
>>> empty V4 pages.  You can force that process along via, say, VACUUM,
>>
>> No, if you can force that process along via some command, whatever it is, 
>> then
>> you're still in the category he described.
>
> Maybe.  The difference is that I'm talking about converting tuples,
> not pages, so "What happens when the data doesn't fit on the new
> page?" is a meaningless question.  

No it's not, because as you pointed out you still need a way for the user to
force it to happen sometime. Unless you're going to be happy with telling
users they need to update all their tuples which would not be an online
process.

In any case it sounds like you're saying you want to allow multiple versions
of tuples on the same page -- which a) would be much harder and b) doesn't
solve the problem since the page still has to be converted sometime anyways.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
"Robert Haas" <[EMAIL PROTECTED]> writes:

>> We've talked about this many times before, so I'm sure you know what my
>> opinion is. Let me phrase it one more time:
>>
>> 1. You *will* need a function to convert a page from old format to new
>> format. We do want to get rid of the old format pages eventually, whether
>> it's during VACUUM, whenever a page is read in, or by using an extra
>> utility. And that process needs to online. Please speak up now if you
>> disagree with that.
>
> Well, I just proposed an approach that doesn't work this way, so I
> guess I'll have to put myself in the disagree category, or anyway yet
> to be convinced.  As long as you can move individual tuples onto new
> pages, you can eventually empty V3 pages and reinitialize them as new,
> empty V4 pages.  You can force that process along via, say, VACUUM,

No, if you can force that process along via some command, whatever it is, then
you're still in the category he described.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
>> Well, I just proposed an approach that doesn't work this way, so I
>> guess I'll have to put myself in the disagree category, or anyway yet
>> to be convinced.  As long as you can move individual tuples onto new
>> pages, you can eventually empty V3 pages and reinitialize them as new,
>> empty V4 pages.  You can force that process along via, say, VACUUM,
>
> No, if you can force that process along via some command, whatever it is, then
> you're still in the category he described.

Maybe.  The difference is that I'm talking about converting tuples,
not pages, so "What happens when the data doesn't fit on the new
page?" is a meaningless question.  Since that seemed to be Heikki's
main concern, I thought we must be talking about different things.  My
thought was that the code path for converting a tuple would be very
similar to what heap_update does today, and large tuples would be
handled via TOAST just as they are now - by converting the relation
one tuple at a time, you might end up with a new relation that has
either more or fewer pages than the old relation, and it really
doesn't matter which.

I haven't really thought through all of the other kinds of things that
might need to be converted, though.  That's where it would be useful
for someone more experienced to weigh in on indexes, etc.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
> That's sane *if* you can guarantee that only negligible overhead is
> added for accessing data that is in the up-to-date format.  I don't
> think that will be the case if we start putting version checks into
> every tuple access macro.

Yes, the point is that you'll read the page as V3 or V4, whichever it
is, but if it's V3, you'll convert the tuples to V4 format before you
try to doing anything with them (for example by modifying
ExecStoreTuple to copy any V3 tuple into a palloc'd buffer, which fits
nicely into what that function already does).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Tom Lane
"Robert Haas" <[EMAIL PROTECTED]> writes:
> Well, I just proposed an approach that doesn't work this way, so I
> guess I'll have to put myself in the disagree category, or anyway yet
> to be convinced.  As long as you can move individual tuples onto new
> pages, you can eventually empty V3 pages and reinitialize them as new,
> empty V4 pages.  You can force that process along via, say, VACUUM,
> but in the meantime you can still continue to read the old pages
> without being forced to change them to the new format.  That's not the
> only possible approach, but it's not obvious to me that it's insane.
> If you think it's a non-starter, it would be good to know why.

That's sane *if* you can guarantee that only negligible overhead is
added for accessing data that is in the up-to-date format.  I don't
think that will be the case if we start putting version checks into
every tuple access macro.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
> We've talked about this many times before, so I'm sure you know what my
> opinion is. Let me phrase it one more time:
>
> 1. You *will* need a function to convert a page from old format to new
> format. We do want to get rid of the old format pages eventually, whether
> it's during VACUUM, whenever a page is read in, or by using an extra
> utility. And that process needs to online. Please speak up now if you
> disagree with that.

Well, I just proposed an approach that doesn't work this way, so I
guess I'll have to put myself in the disagree category, or anyway yet
to be convinced.  As long as you can move individual tuples onto new
pages, you can eventually empty V3 pages and reinitialize them as new,
empty V4 pages.  You can force that process along via, say, VACUUM,
but in the meantime you can still continue to read the old pages
without being forced to change them to the new format.  That's not the
only possible approach, but it's not obvious to me that it's insane.
If you think it's a non-starter, it would be good to know why.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Heikki Linnakangas

Zdenek Kotala wrote:

Robert Haas napsal(a):

Really, what I'd ideally like to see here is a system where the V3
code is in essence error-recovery code.  Everything should be V4-only
unless you detect a V3 page, and then you error out (if in-place
upgrade is not enabled) or jump to the appropriate V3-aware code (if
in-place upgrade is enabled).  In theory, with a system like this, it
seems like the overhead for V4 ought to be no more than the cost of
checking the page version on each page read, which is a cheap sanity
check we'd be willing to pay for anyway, and trivial in cost.


OK. It was original idea to make "Convert on read" which has several 
problems with no easy solution. One is that new data does not fit on the 
page and second big problem is how to convert TOAST table data. Another 
problem which is general is how to convert indexes...


We've talked about this many times before, so I'm sure you know what my 
opinion is. Let me phrase it one more time:


1. You *will* need a function to convert a page from old format to new 
format. We do want to get rid of the old format pages eventually, 
whether it's during VACUUM, whenever a page is read in, or by using an 
extra utility. And that process needs to online. Please speak up now if 
you disagree with that.


2. It follows from point 1, that you *will* need to solve the problems 
with pages where the data doesn't fit on the page in new format, as well 
as converting TOAST data.


We've discussed various solutions to those problems; it's not 
insurmountable. For the "data doesn't fit anymore" problem, a fairly 
simple solution is to run a pre-upgrade utility in the old version, that 
reserves some free space on each page, to make sure everything fits 
after converting to new format. For TOAST, you can retoast tuples when 
the heap page is read in. I'm not sure what the problem with indexes is, 
but you can split pages if necessary, for example.


Assuming everyone agrees with point 1, could we focus on these issues?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
> I see. But Vacuum and other internals function access heap pages directly
> without ExecStoreTuple.

Right.  I don't think there's any getting around the fact that any
function which accesses heap pages directly is going to need
modification.  The key is to make those modifications as non-invasive
as possible.  For example, in the case of vacuum, as soon as it
detects that a V3 page has been read, it should call a special
function whose only purpose in life is to move the data out of that V3
page and onto one or more V4 pages, and return.  What you shouldn't do
is try to make the regular vacuum code handle both V3 and V4 pages,
because that will lead to code that may be slow and will almost
certainly be complicated and difficult to maintain.

I'll read through the rest of this when I have a bit more time.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Zdenek Kotala

Robert Haas napsal(a):

OK. It was original idea to make "Convert on read" which has several
problems with no easy solution. One is that new data does not fit on the
page and second big problem is how to convert TOAST table data. Another
problem which is general is how to convert indexes...

Convert on read has minimal impact on core when latest version is processed.
But problem is what happen when you need to migrate tuple form page to new
one modify index and also needs convert toast value(s)... Problem is that
response could be long in some query, because it invokes a lot of changes
and conversion.  I think in corner case it could requires converts all index
when you request one record.


I don't think I'm proposing convert on read, exactly.  If you actually
try to convert the entire page when you read it in, I think you're
doomed to failure, because, as you rightly point out, there is
absolutely no guarantee that the page contents in their new format
will still fit into one block.  I think what you want to do is convert
the structures within the page one by one as you read them out of the
page.  The proposed refactoring of ExecStoreTuple will do exactly
this, for example.


I see. But Vacuum and other internals function access heap pages directly 
without ExecStoreTuple. however you point to one idea which I'm currently 
thinking about it too. There is my version:


If you look into new page API it has PageGetHeapTuple. It could do the 
conversion job. Problem is that you don't have relation info there and you 
cannot convert data, but transaction information can be converted.


I think about HeapTupleData structure modification. It will have pointer to 
transaction info t_transinfo, which will point to the page tuple for V4. For V3 
PageGetHeapTuple function will allocate memory and put converted data here.


ExecStoreTuple will finally convert data. Because it know about relation and It 
does not make sense convert data early. Who wants to convert invisible or dead data.


With this approach tuple will be processed same way with V4 without any overhead 
(they will be small overhead with allocating and free heaptupledata in some 
places - mostly vacuum).


Only multi version access will be driven on page basis.

Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
> OK. It was original idea to make "Convert on read" which has several
> problems with no easy solution. One is that new data does not fit on the
> page and second big problem is how to convert TOAST table data. Another
> problem which is general is how to convert indexes...
>
> Convert on read has minimal impact on core when latest version is processed.
> But problem is what happen when you need to migrate tuple form page to new
> one modify index and also needs convert toast value(s)... Problem is that
> response could be long in some query, because it invokes a lot of changes
> and conversion.  I think in corner case it could requires converts all index
> when you request one record.

I don't think I'm proposing convert on read, exactly.  If you actually
try to convert the entire page when you read it in, I think you're
doomed to failure, because, as you rightly point out, there is
absolutely no guarantee that the page contents in their new format
will still fit into one block.  I think what you want to do is convert
the structures within the page one by one as you read them out of the
page.  The proposed refactoring of ExecStoreTuple will do exactly
this, for example.

HEAD uses a pointer into the actual buffer for a V4 tuple that comes
from an existing relation, and a pointer to a palloc'd structure for a
tuple that is generated during query execution.  The proposed
refactoring will keep these rules, plus add a new rule that if you
happen to read a V3 page, you will palloc space for a new V4 tuple
that is semantically equivalent to the V3 tuple on the page, and use
that pointer instead.  That, it seems to me, is exactly the right
balance - the PAGE is still a V3 page, but all of the tuples that the
upper-level code ever sees are V4 tuples.

I'm not sure how far this particular approach can be generalized.
ExecStoreTuple has the advantage that it already has to deal with both
direct buffer pointers and palloc'd structures, so the code doesn't
need to be much more complex to handle this case as well.  I think the
thing to do is go through and scrutinize all of the ReadBuffer call
sites and figure out an approach to each one.  I haven't looked at
your latest code yet, so you may have already done this, but just for
example, RelationGetBufferForTuple should probably just reject any V3
pages encountered as if they were full, including updating the FSM
where appropriate.  I would think that it would be possible to
implement that with almost zero performance impact.  I'm happy to look
at and discuss the problem cases with you, and hopefully others will
chime in as well since my knowledge of the code is far from
exhaustive.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Zdenek Kotala

Robert Haas napsal(a):



Really, what I'd ideally like to see here is a system where the V3
code is in essence error-recovery code.  Everything should be V4-only
unless you detect a V3 page, and then you error out (if in-place
upgrade is not enabled) or jump to the appropriate V3-aware code (if
in-place upgrade is enabled).  In theory, with a system like this, it
seems like the overhead for V4 ought to be no more than the cost of
checking the page version on each page read, which is a cheap sanity
check we'd be willing to pay for anyway, and trivial in cost.


OK. It was original idea to make "Convert on read" which has several problems 
with no easy solution. One is that new data does not fit on the page and second 
big problem is how to convert TOAST table data. Another problem which is general 
is how to convert indexes...


Convert on read has minimal impact on core when latest version is processed. But 
problem is what happen when you need to migrate tuple form page to new one 
modify index and also needs convert toast value(s)... Problem is that response 
could be long in some query, because it invokes a lot of changes and conversion. 
 I think in corner case it could requires converts all index when you request 
one record.


Zdenek




--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Robert Haas
> We already do check the page version on read-in --- see PageHeaderIsValid.

Right, but the only place this is called is in ReadBuffer_common,
which doesn't seem like a suitable place to deal with the possibility
of a V3 page since you don't yet know what you plan to do with it.
I'm not quite sure what the right solution to that problem is...

>> But I think we probably need some input from -core on this topic as well.
> I concur that I don't want to see this patch adding more than the
> absolute unavoidable minimum of overhead for data that meets the
> "current" layout definition.  I'm disturbed by the proposal to stick
> overhead into tuple header access, for example.

...but it seems like we both agree that conditionalizing heap tuple
header access on page version is not the right answer.  Based on that,
I'm going to move the "htup and bufpage API clean up" patch to
"Returned with feedback" and continue reviewing the remainder of these
patches.

As I'm looking at this, I'm realizing another problem - there is a lot
of code that looks like this:

void HeapTupleSetXmax(HeapTuple tuple, TransactionId xmax)
{
  switch(tuple->t_ver)
  {
  case 4 : tuple->t_data->t_choice.t_heap.t_xmax = xmax;
   break;
  case 3 : TPH03(tuple)->t_choice.t_heap.t_xmax = xmax;
   break;
  default: elog(PANIC, "HeapTupleSetXmax is not supported.");
  }
}

TPH03 is a macro that is casting tuple->t_data to HeapTupleHeader_03.
Unless I'm missing something, that means that given an arbitrary
pointer to HeapTuple, there is absolutely no guarantee that
tuple->t_data->t_choice actually points to that field at all.  It will
if tuple->t_ver happens to be 4 OR if HeapTupleHeader and
HeapTupleHeader_03 happen to agree on where t_choice is; otherwise it
points to some other member of HeapTupleHeader_03, or off the end of
the structure.  To me that seems unacceptably fragile, because it
means the compiler can't warn us that we're using a pointer
inappropriately.  If we truly want to be safe here then we need to
create an opaque HeapTupleHeader structure that contains only those
elements that HeapTupleHeader_03 and HeapTupleHeader_04 have in
common, and cast BOTH of them after checking the version.  That way if
somone writes a function that attempts to deference a HeapTupleHeader
without going through the API, it will fail to compile rather than
mostly working but possibly failing on a V3 page.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Tom Lane
"Robert Haas" <[EMAIL PROTECTED]> writes:
> Really, what I'd ideally like to see here is a system where the V3
> code is in essence error-recovery code.  Everything should be V4-only
> unless you detect a V3 page, and then you error out (if in-place
> upgrade is not enabled) or jump to the appropriate V3-aware code (if
> in-place upgrade is enabled).  In theory, with a system like this, it
> seems like the overhead for V4 ought to be no more than the cost of
> checking the page version on each page read, which is a cheap sanity
> check we'd be willing to pay for anyway, and trivial in cost.

We already do check the page version on read-in --- see PageHeaderIsValid.

> But I think we probably need some input from -core on this topic as well.

I concur that I don't want to see this patch adding more than the
absolute unavoidable minimum of overhead for data that meets the
"current" layout definition.  I'm disturbed by the proposal to stick
overhead into tuple header access, for example.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Robert Haas
> You need to apply also two other patches:
> which are located here:
> http://wiki.postgresql.org/wiki/CommitFestInProgress#Upgrade-in-place_and_related_issues
> I moved one related patch from another category here to correct place.

Just to confirm, which two?

> http://git.postgresql.org/?p=~davidfetter/upgrade_in_place/.git;a=snapshot;h=c72bafada59ed278ffac59657c913bc375f77808;sf=tgz
>
> It should contains every think including yesterdays improvements (delete,
> insert, update works - inser/update only on table without index).

Wow, sounds like great improvements.  I understand your difficulties
in keeping up with HEAD, but I hope we can figure out some solution,
because right now I have a diff (that I can't apply) and a tarball
(that I can't diff) and that is not ideal for reviewing.

> Yeah, it is most difficult part :-) find correct names for it. I think that
> each  version of structure should have version suffix including lastone. And
> of cource the last one we should have a general name without suffix - see
> example:
>
> typedef struct PageHeaderData_04 { ...} PageHeaderData_04
> typedef struct PageHeaderData_03 { ...} PageHeaderData_03
> typedef PageHeaderData_04 PageHeaderData
>
> This allows you exactly specify version on places where you need it and keep
> general name where version is not relevant.

That doesn't make sense to me.  If PageHeaderData and
PageHeaderData_04 are the same type, how do you decide which one to
use in any particular place in the code?

> How suffix should looks it another question. I prefer to have 04 not only 4.
> What's about PageHeaderData_V04?

I prefer "V" as a delimiter rather than "_" because that makes it more
clear that the number which follows is a version number, but I think
"_V" is overkill.  However, I don't really want to argue the point;
I'm just throwing in my $0.02 and I am sure others will have their own
views as well.

> By the way what YMMV means?

"Your Mileage May Vary."
http://www.urbandictionary.com/define.php?term=YMMV

>> I am pretty skeptical of the idea that all of the HeapTuple* functions
>> can just be conditionalized on the page version and everything will
>> Just Work.  It seems like that is too low a level to be worrying about
>> such things.  Even if it happens to work for the changes between V3
>> and V4, what happens when V5 or V6 is changed in such a way that the
>> answer to HeapTupleIsWhatever is neither "Yes" nor "No", but rather
>> "Maybe" or "Seven"?  The performance hit also sounds painful.  I don't
>> have a better idea right now though...
>
> OK. Currently it works (or I hope that it works). If somebody in a future
> invent some special change, i think in most (maybe all) cases there will be
> possible mapping.
>
> The speed is key point. When I check it last time I go 1% performance drop
> in fresh database. I think 1% is good price for in-place online upgrade.

I think that's arguable and something that needs to be more broadly
discussed.  I wouldn't be keen to pay a 1% performance drop for this
feature, because it's not a feature I really need.  Sure, in-place
upgrade would be nice to have, but for me, dump and reload isn't a
huge problem.  It's a lot better than the 5% number you quoted
previously, but I'm not sure whether it is good enough,

I would feel more comfortable if the feature could be completely
disabled via compile-time defines.  Then you could build the system
either with or without in-place upgrade, according to your needs.  But
I don't think that's very practical with HeapTuple* as functions.  You
could conditionalize away the switch, but the function call overhead
would remain.  To get rid of that, you'd need some enormous, fragile
hack that I don't even want to contemplate.

Really, what I'd ideally like to see here is a system where the V3
code is in essence error-recovery code.  Everything should be V4-only
unless you detect a V3 page, and then you error out (if in-place
upgrade is not enabled) or jump to the appropriate V3-aware code (if
in-place upgrade is enabled).  In theory, with a system like this, it
seems like the overhead for V4 ought to be no more than the cost of
checking the page version on each page read, which is a cheap sanity
check we'd be willing to pay for anyway, and trivial in cost.

But I think we probably need some input from -core on this topic as well.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Zdenek Kotala

Big thanks for review.

Robert Haas napsal(a):

I tried to apply this patch to CVS HEAD and it blew up all over the
place.  It doesn't seem to be intended to apply against CVS HEAD; for
example, I don't have backend/access/heap/htup.c at all, so can't
apply changes to that file.  


You need to apply also two other patches:
which are located here:
http://wiki.postgresql.org/wiki/CommitFestInProgress#Upgrade-in-place_and_related_issues
I moved one related patch from another category here to correct place.

The problem is that it is difficult to keep it in sync with head, because they 
change a lot of things. It the reason why I put all also into GIT repository, 
but ...



I was able to clone the GIT repository
with the following command...

git clone http://git.postgresql.org/git/~davidfetter/upgrade_in_place/.git

...but now I'm confused, because I don't see the changes from the diff
reflected in the resulting tree.  As you can see, I am not a git
wizard.  Any help would be appreciated.


I'm GIT newbie I use mercurial for development and I manually applied changes 
into GIT. I asked David Fetter with help how to get back the correct clone. In 
meantime you can download a tarball.


http://git.postgresql.org/?p=~davidfetter/upgrade_in_place/.git;a=snapshot;h=c72bafada59ed278ffac59657c913bc375f77808;sf=tgz

It should contains every think including yesterdays improvements (delete, 
insert, update works - inser/update only on table without index).



Here are a few initial thoughts based mostly on reading the diff:

In the minor nit department, I don't really like the idea of
PageHeaderData_04, SizeOfPageHeaderData04, PageLayoutIsValid_04, etc.
I think the latest version should just be PageHeaderData and
SizeOfPageHeaderData, and previous versions should be, e.g.
PageHeaderDataV3.  It looks to me like this would cut a few hunks out
of this and maybe make it a bit easier to understand what is going on.
 At any rate, if we are going to stick with an explicit version number
in both versions, it should be marked in a consistent way, not _04
sometimes and just 04 other times.  My suggestion is e.g. "V4" but
YMMV.


Yeah, it is most difficult part :-) find correct names for it. I think that each 
 version of structure should have version suffix including lastone. And of 
cource the last one we should have a general name without suffix - see example:


typedef struct PageHeaderData_04 { ...} PageHeaderData_04
typedef struct PageHeaderData_03 { ...} PageHeaderData_03
typedef PageHeaderData_04 PageHeaderData

This allows you exactly specify version on places where you need it and keep 
general name where version is not relevant.


How suffix should looks it another question. I prefer to have 04 not only 4.
What's about PageHeaderData_V04?

By the way what YMMV means?


The changes to nodeIndexscan.c and nodeSeqscan.c are worrisome to me.
It looks like the added code is (nearly?) identical in both places, so
probably it needs to be refactored to avoid code duplication.  I'm
also a bit skeptical about the idea of doing the tuple conversion
here.  Why here rather than ExecStoreTuple()?  If you decide to
convert the tuple, you can palloc the new one, pfree the old one if
ShouldFree is set, and reset shouldFree to true.


Good point. I thought about it as a one variant. And if I look it close now it 
is really much better place. It should fix a problem why REINDEX does not work. 
I will move it.



I am pretty skeptical of the idea that all of the HeapTuple* functions
can just be conditionalized on the page version and everything will
Just Work.  It seems like that is too low a level to be worrying about
such things.  Even if it happens to work for the changes between V3
and V4, what happens when V5 or V6 is changed in such a way that the
answer to HeapTupleIsWhatever is neither "Yes" nor "No", but rather
"Maybe" or "Seven"?  The performance hit also sounds painful.  I don't
have a better idea right now though...


OK. Currently it works (or I hope that it works). If somebody in a future invent 
some special change, i think in most (maybe all) cases there will be possible 
mapping.


The speed is key point. When I check it last time I go 1% performance drop in 
fresh database. I think 1% is good price for in-place online upgrade.



I think it's going to be absolutely imperative to begin vacuuming away
old V3 pages as quickly as possible after the upgrade.  If you go with
the approach of converting the tuple in, or just before,
ExecStoreTuple, then you're going to introduce a lot of overhead when
working with V3 pages.  I think that's fine.  You should plan to do
your in-place upgrade at 1AM on Christmas morning (or whenever your
load hits rock bottom...) and immediately start converting the
database, starting with your most important and smallest tables.  In
fact, I would look whenever possible for ways to make the V4 case a
fast-path and just accept that the system is going to labor a bit when
dealing with V3 stuff.  A

Re: [HACKERS] [WIP] In-place upgrade

2008-11-02 Thread Robert Haas
I tried to apply this patch to CVS HEAD and it blew up all over the
place.  It doesn't seem to be intended to apply against CVS HEAD; for
example, I don't have backend/access/heap/htup.c at all, so can't
apply changes to that file.  I was able to clone the GIT repository
with the following command...

git clone http://git.postgresql.org/git/~davidfetter/upgrade_in_place/.git

...but now I'm confused, because I don't see the changes from the diff
reflected in the resulting tree.  As you can see, I am not a git
wizard.  Any help would be appreciated.

Here are a few initial thoughts based mostly on reading the diff:

In the minor nit department, I don't really like the idea of
PageHeaderData_04, SizeOfPageHeaderData04, PageLayoutIsValid_04, etc.
I think the latest version should just be PageHeaderData and
SizeOfPageHeaderData, and previous versions should be, e.g.
PageHeaderDataV3.  It looks to me like this would cut a few hunks out
of this and maybe make it a bit easier to understand what is going on.
 At any rate, if we are going to stick with an explicit version number
in both versions, it should be marked in a consistent way, not _04
sometimes and just 04 other times.  My suggestion is e.g. "V4" but
YMMV.

The changes to nodeIndexscan.c and nodeSeqscan.c are worrisome to me.
It looks like the added code is (nearly?) identical in both places, so
probably it needs to be refactored to avoid code duplication.  I'm
also a bit skeptical about the idea of doing the tuple conversion
here.  Why here rather than ExecStoreTuple()?  If you decide to
convert the tuple, you can palloc the new one, pfree the old one if
ShouldFree is set, and reset shouldFree to true.

I am pretty skeptical of the idea that all of the HeapTuple* functions
can just be conditionalized on the page version and everything will
Just Work.  It seems like that is too low a level to be worrying about
such things.  Even if it happens to work for the changes between V3
and V4, what happens when V5 or V6 is changed in such a way that the
answer to HeapTupleIsWhatever is neither "Yes" nor "No", but rather
"Maybe" or "Seven"?  The performance hit also sounds painful.  I don't
have a better idea right now though...

I think it's going to be absolutely imperative to begin vacuuming away
old V3 pages as quickly as possible after the upgrade.  If you go with
the approach of converting the tuple in, or just before,
ExecStoreTuple, then you're going to introduce a lot of overhead when
working with V3 pages.  I think that's fine.  You should plan to do
your in-place upgrade at 1AM on Christmas morning (or whenever your
load hits rock bottom...) and immediately start converting the
database, starting with your most important and smallest tables.  In
fact, I would look whenever possible for ways to make the V4 case a
fast-path and just accept that the system is going to labor a bit when
dealing with V3 stuff.  Any overhead you introduce when dealing with
V3 pages can go away; any V4 overhead is permanent and therefore much
more difficult to accept.

That's about all I have for now... if you can give me some pointers on
working with this git repository, or provide a complete patch that
applies cleanly to CVS HEAD, I will try to look at this in more
detail.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers