Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

Zdenek Kotala Wed, 11 Jun 2008 09:05:57 -0700

Heikki Linnakangas napsal(a):

Tom Lane wrote:
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
(this won't come as a surprise as we talked about this in PGCon, but)I think we should rather convert the page structure to new format inReadBuffer the first time a page is read in. That would keep thechanges a lot more isolated.
The problem is that ReadBuffer is an extremely low-level environment,
and it's not clear that it's possible (let alone practical) to do a
conversion at that level in every case.
Well, we can't predict the future, and can't guarantee that it'spossible or practical to do the things we need to do in the future nomatter what approach we choose.
 In particular it hardly seems
sane to expect ReadBuffer to do tuple content conversion, which is going
to be practically impossible to perform without any catalog accesses.
ReadBuffer has access to Relation, which has information about what kindof a relation it's dealing with, and TupleDesc. That should get uspretty far. It would be a modularity violation, for sure, but I couldlive with that for the purpose of page version conversion.

But if you look for example into hash implementation some pages are not inregular format and conversion could need more information which we do not haveto have in ReadBuffer.

Another issue is that it might not be possible to update a page for
lack of space.  Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger?
We do need some solution to that. One idea is to run a pre-upgradescript in the old version that scans the database and moves tuples thatwould no longer fit on their pages in the new version. This could be runbefore the upgrade, while the old database is still running, so it wouldbe acceptable for that to take some time.

It could not work for indexes and do not forget TOAST chunks. I think in somecases you can get unused quoter of each page in TOAST table.

No doubt people would prefer something better than that. Another ideawould be to have some over-sized buffers that can be used as the targetof conversion, until some tuples are moved off to another page. Perhapsthe over-sized buffer wouldn't need to be in shared memory, if they'reread-only until some tuples are moved.

Anyway, you need mechanism how to mark that this page is read only which is alsorequire a lot of modification. And some mechanism how to make a decision whenthis page converted. I guess this approach will require similar modification asconvert on write.

This is pretty hand-wavy, I know. The point is, I don't think theseproblems are insurmountable.
 (Likely counterexample: adding collation info to text values.)
I doubt it, as collation is not a property of text values, butoperations. But that's off-topic...


Yes, it is offtopic, however I think Tom is right :-).

                Zdenek




--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)

Reply via email to