Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-13 Thread Zdenek Kotala

Ron Mayer napsal(a):

Tom Lane wrote:

Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?   In such a
situation an in-place update might be impossible, and that certainly
takes it outside the bounds of what ReadBuffer can be expected to manage.


Would a possible solution to this be that you could



snip



  2. Run some new maintenance command like vacuum expand or
 vacuum prepare_for_upgrade or something that would split
 any too-full pages, leaving only pages with enough space.


It does not solve problems for example with TOAST tables. If chunks does not fit 
on a new page layout one of the chunk tuple have to be moved to free page. It 
means you get a lot of pages with ~2kB of free unused space. And if max chunk 
size is different between version you got another problem as well.


There is also idea to change compression algorithm for 8.4 (or offer more 
varinats). It also mean that you need to understand old algorithm in a new 
version or you need to repack everything on old version.



Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-13 Thread Zdenek Kotala

Bruce Momjian napsal(a):

Heikki Linnakangas wrote:

Zdenek Kotala wrote:

4) Implementation

The main point of implementation is to have several version of 
PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct 
structure will be handled in special branch (see examples).
(this won't come as a surprise as we talked about this in PGCon, but) I 
think we should rather convert the page structure to new format in 
ReadBuffer the first time a page is read in. That would keep the changes 
a lot more isolated.


Note that you need to handle not only page header changes, but changes 
to internal representations of different data types, and changes like 
varvarlen and combocid. Those are things that have happened in the past; 
in the future, I'm foreseeing changes to the toast header, for example, 
as there's been a lot of ideas related to toast options compression.


I understand the goal of having good modularity (not having ReadBuffer
modify the page), but I am worried that doing multi-version page
processing in a modular way is going to spread version-specific
information all over the backend code, making is harder to understand.


I don't think so. Page already contains page version information inside and 
currently we have macros like PageSetLSN. Caller needn't know nothing about 
PageHeader representation. It is responsibility of page API to correctly handle 
multi version.


The same we can use for tuple access. It is more complicated but I think it is 
possible. Currently we several macros (e.g. HeapTupleGetOid) which works on 
TupleData structure. Only what we need is extend this API as well.


I think in final we will get more readable code.

Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-13 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 It does not solve problems for example with TOAST tables. If chunks does not 
 fit 
 on a new page layout one of the chunk tuple have to be moved to free page. It 
 means you get a lot of pages with ~2kB of free unused space. And if max chunk 
 size is different between version you got another problem as well.

 There is also idea to change compression algorithm for 8.4 (or offer more 
 varinats). It also mean that you need to understand old algorithm in a new 
 version or you need to repack everything on old version.

I don't have any problem at all with the idea that in-place update isn't
going to support arbitrary changes of parameters, such as modifying the
toast chunk size.  In particular anything that is locked down by
pg_control isn't a problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-12 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Zdenek Kotala wrote:
  4) Implementation
  
  The main point of implementation is to have several version of 
  PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct 
  structure will be handled in special branch (see examples).
 
 (this won't come as a surprise as we talked about this in PGCon, but) I 
 think we should rather convert the page structure to new format in 
 ReadBuffer the first time a page is read in. That would keep the changes 
 a lot more isolated.
 
 Note that you need to handle not only page header changes, but changes 
 to internal representations of different data types, and changes like 
 varvarlen and combocid. Those are things that have happened in the past; 
 in the future, I'm foreseeing changes to the toast header, for example, 
 as there's been a lot of ideas related to toast options compression.

I understand the goal of having good modularity (not having ReadBuffer
modify the page), but I am worried that doing multi-version page
processing in a modular way is going to spread version-specific
information all over the backend code, making is harder to understand.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-12 Thread Decibel!

On Jun 11, 2008, at 10:42 AM, Heikki Linnakangas wrote:


Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?


We do need some solution to that. One idea is to run a pre-upgrade  
script in the old version that scans the database and moves tuples  
that would no longer fit on their pages in the new version. This  
could be run before the upgrade, while the old database is still  
running, so it would be acceptable for that to take some time.


That means old versions have to have some knowledge of new versions.  
There's also a big race condition unless the old version starts  
taking size requirements into account every time a page is dirtied.


No doubt people would prefer something better than that. Another  
idea would be to have some over-sized buffers that can be used as  
the target of conversion, until some tuples are moved off to  
another page. Perhaps the over-sized buffer wouldn't need to be in  
shared memory, if they're read-only until some tuples are moved.


This is pretty hand-wavy, I know. The point is, I don't think these  
problems are insurmountable.


--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-12 Thread Ron Mayer

Tom Lane wrote:

Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?   In such a
situation an in-place update might be impossible, and that certainly
takes it outside the bounds of what ReadBuffer can be expected to manage.


Would a possible solution to this be that you could

  1. Upgrade to the newest minor-version of the old release
 (which has knowledge of the space requirements of the
 new one).

  2. Run some new maintenance command like vacuum expand or
 vacuum prepare_for_upgrade or something that would split
 any too-full pages, leaving only pages with enough space.

  3. Only then shutdown the old server and start the
 new major-version server.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 There are examples:

 void PageSetFull(Page page)
 {
   switch ( PageGetPageLayoutVersion(page) )
   {
   case 4 : ((PageHeader_04) (page))-pd_flags |= PD_PAGE_FULL;
 break;
   default elog(PANIC, PageSetFull is not supported on page 
 layout version %i,
   PageGetPageLayoutVersion(page));
   }
 }

 LocationIndex PageGetLower(Page page)
 {
   switch ( PageGetPageLayoutVersion(page) )
   {
   case 4 : return ((PageHeader_04) (page))-pd_lower);
   }
   elog(PANIC, Unsupported page layout in function PageGetLower.);
 }

I'm fairly concerned about the performance impact of turning what had
been simple field accesses into function calls.  I argue also that since
none of the PageHeader fields have actually moved in any version that's
likely to be supported, the above functions are actually of exactly
zero value.

The proposed PANIC in PageSetFull seems like it requires more thought as
well: surely we don't want that ever to happen.  Which means that
callers need to be careful not to invoke such an operation on an
un-updated page, but this proposed coding offers no aid in making sure
that won't happen.  What is needed there, I think, is some more global
policy about what operations are permitted on old (un-converted) pages
and a high-level approach to ensuring that unsafe operations aren't
attempted.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Heikki Linnakangas

Zdenek Kotala wrote:

4) Implementation

The main point of implementation is to have several version of 
PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct 
structure will be handled in special branch (see examples).


(this won't come as a surprise as we talked about this in PGCon, but) I 
think we should rather convert the page structure to new format in 
ReadBuffer the first time a page is read in. That would keep the changes 
a lot more isolated.


Note that you need to handle not only page header changes, but changes 
to internal representations of different data types, and changes like 
varvarlen and combocid. Those are things that have happened in the past; 
in the future, I'm foreseeing changes to the toast header, for example, 
as there's been a lot of ideas related to toast options compression.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 (this won't come as a surprise as we talked about this in PGCon, but) I 
 think we should rather convert the page structure to new format in 
 ReadBuffer the first time a page is read in. That would keep the changes 
 a lot more isolated.

The problem is that ReadBuffer is an extremely low-level environment,
and it's not clear that it's possible (let alone practical) to do a
conversion at that level in every case.  In particular it hardly seems
sane to expect ReadBuffer to do tuple content conversion, which is going
to be practically impossible to perform without any catalog accesses.

Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?  (Likely
counterexample: adding collation info to text values.)  In such a
situation an in-place update might be impossible, and that certainly
takes it outside the bounds of what ReadBuffer can be expected to manage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Zdenek Kotala

Tom Lane napsal(a):

Zdenek Kotala [EMAIL PROTECTED] writes:

There are examples:



void PageSetFull(Page page)
{
switch ( PageGetPageLayoutVersion(page) )
{
case 4 : ((PageHeader_04) (page))-pd_flags |= PD_PAGE_FULL;
  break;
default elog(PANIC, PageSetFull is not supported on page layout 
version %i,
PageGetPageLayoutVersion(page));
}
}



LocationIndex PageGetLower(Page page)
{
switch ( PageGetPageLayoutVersion(page) )
{
case 4 : return ((PageHeader_04) (page))-pd_lower);
}
elog(PANIC, Unsupported page layout in function PageGetLower.);
}


I'm fairly concerned about the performance impact of turning what had
been simple field accesses into function calls.  


I use functions now because it is easy to track what's going on. Finally it 
should be (mostly) macros.



I argue also that since
none of the PageHeader fields have actually moved in any version that's
likely to be supported, the above functions are actually of exactly
zero value.


Yeah, it is why I'm thinking to use page header with unions inside (for example 
TSL/flag field)
and use switch only in case like TSL or flags fields. What I don't know if 
fields in this structure will be placed on same place on all platforms.



The proposed PANIC in PageSetFull seems like it requires more thought as
well: surely we don't want that ever to happen.  Which means that
callers need to be careful not to invoke such an operation on an
un-updated page, but this proposed coding offers no aid in making sure
that won't happen.  What is needed there, I think, is some more global
policy about what operations are permitted on old (un-converted) pages
and a high-level approach to ensuring that unsafe operations aren't
attempted.


ad) PANIC
PANIC shouldn't happen because page validation in BufferRead should check 
supported page version.


ad) policy - it is good catch. I think all read page operation should be allowed 
on old page version. Only tuple, LSN, TSL, and special modification should be 
allowed for writing. Addpageitem should invokes page conversion before any 
action happen (if there is free space for tuple, it is possible to convert page 
in to the new format, but after conversion space could be smaller then tuple.).


Zdenek








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
(this won't come as a surprise as we talked about this in PGCon, but) I 
think we should rather convert the page structure to new format in 
ReadBuffer the first time a page is read in. That would keep the changes 
a lot more isolated.


The problem is that ReadBuffer is an extremely low-level environment,
and it's not clear that it's possible (let alone practical) to do a
conversion at that level in every case.


Well, we can't predict the future, and can't guarantee that it's 
possible or practical to do the things we need to do in the future no 
matter what approach we choose.



 In particular it hardly seems
sane to expect ReadBuffer to do tuple content conversion, which is going
to be practically impossible to perform without any catalog accesses.


ReadBuffer has access to Relation, which has information about what kind 
of a relation it's dealing with, and TupleDesc. That should get us 
pretty far. It would be a modularity violation, for sure, but I could 
live with that for the purpose of page version conversion.



Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?


We do need some solution to that. One idea is to run a pre-upgrade 
script in the old version that scans the database and moves tuples that 
would no longer fit on their pages in the new version. This could be run 
before the upgrade, while the old database is still running, so it would 
be acceptable for that to take some time.


No doubt people would prefer something better than that. Another idea 
would be to have some over-sized buffers that can be used as the target 
of conversion, until some tuples are moved off to another page. Perhaps 
the over-sized buffer wouldn't need to be in shared memory, if they're 
read-only until some tuples are moved.


This is pretty hand-wavy, I know. The point is, I don't think these 
problems are insurmountable.



 (Likely counterexample: adding collation info to text values.)


I doubt it, as collation is not a property of text values, but 
operations. But that's off-topic...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

4) Implementation

The main point of implementation is to have several version of 
PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and 
correct structure will be handled in special branch (see examples).


(this won't come as a surprise as we talked about this in PGCon, but) I 
think we should rather convert the page structure to new format in 
ReadBuffer the first time a page is read in. That would keep the changes 
a lot more isolated.


I agree with Tom's reply. And anyway this approach will be mostly isolated into 
page.c and you need to able read old page in both cases.


Note that you need to handle not only page header changes, but changes 
to internal representations of different data types, and changes like 
varvarlen and combocid. Those are things that have happened in the past; 
in the future, I'm foreseeing changes to the toast header, for example, 
as there's been a lot of ideas related to toast options compression.


I know, this is a first small step for inplace upgrade. Tupleheader will follow. 
Page structure is basic. I want to split development into small steps, because 
it is easy to review.


 Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
(this won't come as a surprise as we talked about this in PGCon, but) 
I think we should rather convert the page structure to new format in 
ReadBuffer the first time a page is read in. That would keep the 
changes a lot more isolated.


The problem is that ReadBuffer is an extremely low-level environment,
and it's not clear that it's possible (let alone practical) to do a
conversion at that level in every case.


Well, we can't predict the future, and can't guarantee that it's 
possible or practical to do the things we need to do in the future no 
matter what approach we choose.



 In particular it hardly seems
sane to expect ReadBuffer to do tuple content conversion, which is going
to be practically impossible to perform without any catalog accesses.


ReadBuffer has access to Relation, which has information about what kind 
of a relation it's dealing with, and TupleDesc. That should get us 
pretty far. It would be a modularity violation, for sure, but I could 
live with that for the purpose of page version conversion.


But if you look for example into hash implementation some pages are not in 
regular format and conversion could need more information which we do not have 
to have in ReadBuffer.



Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?


We do need some solution to that. One idea is to run a pre-upgrade 
script in the old version that scans the database and moves tuples that 
would no longer fit on their pages in the new version. This could be run 
before the upgrade, while the old database is still running, so it would 
be acceptable for that to take some time.


It could not work for indexes and do not forget TOAST chunks. I think in some 
cases you can get unused quoter of each page in TOAST table.


No doubt people would prefer something better than that. Another idea 
would be to have some over-sized buffers that can be used as the target 
of conversion, until some tuples are moved off to another page. Perhaps 
the over-sized buffer wouldn't need to be in shared memory, if they're 
read-only until some tuples are moved.


Anyway, you need mechanism how to mark that this page is read only which is also 
 require a lot of modification. And some mechanism how to make a decision when 
this page converted. I guess this approach will require similar modification as 
convert on write.


This is pretty hand-wavy, I know. The point is, I don't think these 
problems are insurmountable.



 (Likely counterexample: adding collation info to text values.)


I doubt it, as collation is not a property of text values, but 
operations. But that's off-topic...


Yes, it is offtopic, however I think Tom is right :-).

Zdenek




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 (Likely counterexample: adding collation info to text values.)

I don't think the argument really needs an example, but I would be pretty
upset if we proposed tagging every text datum with a collation. Encoding
perhaps, though that seems like a bad idea to me on performance grounds, but
collation is not a property of the data at all.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

2008-06-11 Thread Stephen Denne
Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  (Likely counterexample: adding collation info to text values.)
 
 I don't think the argument really needs an example, but I 
 would be pretty
 upset if we proposed tagging every text datum with a 
 collation. Encoding
 perhaps, though that seems like a bad idea to me on 
 performance grounds, but
 collation is not a property of the data at all.

Again not directly related to difficulties upgrading pages...

The recent discussion ...
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00102.php
... mentions keeping collation information together with text data,
however it is referring to keeping it together when processing it,
not when storing the text.

Regards,
Stephen Denne.
--
At the Datamail Group we value teamwork, respect, achievement, client focus, 
and courage. 
This email with any attachments is confidential and may be subject to legal 
privilege.  
If it is not intended for you please advise by replying immediately, destroy it 
and do not 
copy, disclose or use it in any way.

The Datamail Group, through our GoGreen programme, is committed to 
environmental sustainability.  
Help us in our efforts by not printing this email.
__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers