Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Ron Mayer napsal(a): Tom Lane wrote: Another issue is that it might not be possible to update a page for lack of space. Are we prepared to assume that there will never be a transformation we need to apply that makes the data bigger? In such a situation an in-place update might be impossible, and that certainly takes it outside the bounds of what ReadBuffer can be expected to manage. Would a possible solution to this be that you could snip 2. Run some new maintenance command like vacuum expand or vacuum prepare_for_upgrade or something that would split any too-full pages, leaving only pages with enough space. It does not solve problems for example with TOAST tables. If chunks does not fit on a new page layout one of the chunk tuple have to be moved to free page. It means you get a lot of pages with ~2kB of free unused space. And if max chunk size is different between version you got another problem as well. There is also idea to change compression algorithm for 8.4 (or offer more varinats). It also mean that you need to understand old algorithm in a new version or you need to repack everything on old version. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Bruce Momjian napsal(a): Heikki Linnakangas wrote: Zdenek Kotala wrote: 4) Implementation The main point of implementation is to have several version of PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct structure will be handled in special branch (see examples). (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. Note that you need to handle not only page header changes, but changes to internal representations of different data types, and changes like varvarlen and combocid. Those are things that have happened in the past; in the future, I'm foreseeing changes to the toast header, for example, as there's been a lot of ideas related to toast options compression. I understand the goal of having good modularity (not having ReadBuffer modify the page), but I am worried that doing multi-version page processing in a modular way is going to spread version-specific information all over the backend code, making is harder to understand. I don't think so. Page already contains page version information inside and currently we have macros like PageSetLSN. Caller needn't know nothing about PageHeader representation. It is responsibility of page API to correctly handle multi version. The same we can use for tuple access. It is more complicated but I think it is possible. Currently we several macros (e.g. HeapTupleGetOid) which works on TupleData structure. Only what we need is extend this API as well. I think in final we will get more readable code. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Zdenek Kotala [EMAIL PROTECTED] writes: It does not solve problems for example with TOAST tables. If chunks does not fit on a new page layout one of the chunk tuple have to be moved to free page. It means you get a lot of pages with ~2kB of free unused space. And if max chunk size is different between version you got another problem as well. There is also idea to change compression algorithm for 8.4 (or offer more varinats). It also mean that you need to understand old algorithm in a new version or you need to repack everything on old version. I don't have any problem at all with the idea that in-place update isn't going to support arbitrary changes of parameters, such as modifying the toast chunk size. In particular anything that is locked down by pg_control isn't a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Heikki Linnakangas wrote: Zdenek Kotala wrote: 4) Implementation The main point of implementation is to have several version of PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct structure will be handled in special branch (see examples). (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. Note that you need to handle not only page header changes, but changes to internal representations of different data types, and changes like varvarlen and combocid. Those are things that have happened in the past; in the future, I'm foreseeing changes to the toast header, for example, as there's been a lot of ideas related to toast options compression. I understand the goal of having good modularity (not having ReadBuffer modify the page), but I am worried that doing multi-version page processing in a modular way is going to spread version-specific information all over the backend code, making is harder to understand. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
On Jun 11, 2008, at 10:42 AM, Heikki Linnakangas wrote: Another issue is that it might not be possible to update a page for lack of space. Are we prepared to assume that there will never be a transformation we need to apply that makes the data bigger? We do need some solution to that. One idea is to run a pre-upgrade script in the old version that scans the database and moves tuples that would no longer fit on their pages in the new version. This could be run before the upgrade, while the old database is still running, so it would be acceptable for that to take some time. That means old versions have to have some knowledge of new versions. There's also a big race condition unless the old version starts taking size requirements into account every time a page is dirtied. No doubt people would prefer something better than that. Another idea would be to have some over-sized buffers that can be used as the target of conversion, until some tuples are moved off to another page. Perhaps the over-sized buffer wouldn't need to be in shared memory, if they're read-only until some tuples are moved. This is pretty hand-wavy, I know. The point is, I don't think these problems are insurmountable. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Tom Lane wrote: Another issue is that it might not be possible to update a page for lack of space. Are we prepared to assume that there will never be a transformation we need to apply that makes the data bigger? In such a situation an in-place update might be impossible, and that certainly takes it outside the bounds of what ReadBuffer can be expected to manage. Would a possible solution to this be that you could 1. Upgrade to the newest minor-version of the old release (which has knowledge of the space requirements of the new one). 2. Run some new maintenance command like vacuum expand or vacuum prepare_for_upgrade or something that would split any too-full pages, leaving only pages with enough space. 3. Only then shutdown the old server and start the new major-version server. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Zdenek Kotala [EMAIL PROTECTED] writes: There are examples: void PageSetFull(Page page) { switch ( PageGetPageLayoutVersion(page) ) { case 4 : ((PageHeader_04) (page))-pd_flags |= PD_PAGE_FULL; break; default elog(PANIC, PageSetFull is not supported on page layout version %i, PageGetPageLayoutVersion(page)); } } LocationIndex PageGetLower(Page page) { switch ( PageGetPageLayoutVersion(page) ) { case 4 : return ((PageHeader_04) (page))-pd_lower); } elog(PANIC, Unsupported page layout in function PageGetLower.); } I'm fairly concerned about the performance impact of turning what had been simple field accesses into function calls. I argue also that since none of the PageHeader fields have actually moved in any version that's likely to be supported, the above functions are actually of exactly zero value. The proposed PANIC in PageSetFull seems like it requires more thought as well: surely we don't want that ever to happen. Which means that callers need to be careful not to invoke such an operation on an un-updated page, but this proposed coding offers no aid in making sure that won't happen. What is needed there, I think, is some more global policy about what operations are permitted on old (un-converted) pages and a high-level approach to ensuring that unsafe operations aren't attempted. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Zdenek Kotala wrote: 4) Implementation The main point of implementation is to have several version of PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct structure will be handled in special branch (see examples). (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. Note that you need to handle not only page header changes, but changes to internal representations of different data types, and changes like varvarlen and combocid. Those are things that have happened in the past; in the future, I'm foreseeing changes to the toast header, for example, as there's been a lot of ideas related to toast options compression. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Heikki Linnakangas [EMAIL PROTECTED] writes: (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. The problem is that ReadBuffer is an extremely low-level environment, and it's not clear that it's possible (let alone practical) to do a conversion at that level in every case. In particular it hardly seems sane to expect ReadBuffer to do tuple content conversion, which is going to be practically impossible to perform without any catalog accesses. Another issue is that it might not be possible to update a page for lack of space. Are we prepared to assume that there will never be a transformation we need to apply that makes the data bigger? (Likely counterexample: adding collation info to text values.) In such a situation an in-place update might be impossible, and that certainly takes it outside the bounds of what ReadBuffer can be expected to manage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Tom Lane napsal(a): Zdenek Kotala [EMAIL PROTECTED] writes: There are examples: void PageSetFull(Page page) { switch ( PageGetPageLayoutVersion(page) ) { case 4 : ((PageHeader_04) (page))-pd_flags |= PD_PAGE_FULL; break; default elog(PANIC, PageSetFull is not supported on page layout version %i, PageGetPageLayoutVersion(page)); } } LocationIndex PageGetLower(Page page) { switch ( PageGetPageLayoutVersion(page) ) { case 4 : return ((PageHeader_04) (page))-pd_lower); } elog(PANIC, Unsupported page layout in function PageGetLower.); } I'm fairly concerned about the performance impact of turning what had been simple field accesses into function calls. I use functions now because it is easy to track what's going on. Finally it should be (mostly) macros. I argue also that since none of the PageHeader fields have actually moved in any version that's likely to be supported, the above functions are actually of exactly zero value. Yeah, it is why I'm thinking to use page header with unions inside (for example TSL/flag field) and use switch only in case like TSL or flags fields. What I don't know if fields in this structure will be placed on same place on all platforms. The proposed PANIC in PageSetFull seems like it requires more thought as well: surely we don't want that ever to happen. Which means that callers need to be careful not to invoke such an operation on an un-updated page, but this proposed coding offers no aid in making sure that won't happen. What is needed there, I think, is some more global policy about what operations are permitted on old (un-converted) pages and a high-level approach to ensuring that unsafe operations aren't attempted. ad) PANIC PANIC shouldn't happen because page validation in BufferRead should check supported page version. ad) policy - it is good catch. I think all read page operation should be allowed on old page version. Only tuple, LSN, TSL, and special modification should be allowed for writing. Addpageitem should invokes page conversion before any action happen (if there is free space for tuple, it is possible to convert page in to the new format, but after conversion space could be smaller then tuple.). Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. The problem is that ReadBuffer is an extremely low-level environment, and it's not clear that it's possible (let alone practical) to do a conversion at that level in every case. Well, we can't predict the future, and can't guarantee that it's possible or practical to do the things we need to do in the future no matter what approach we choose. In particular it hardly seems sane to expect ReadBuffer to do tuple content conversion, which is going to be practically impossible to perform without any catalog accesses. ReadBuffer has access to Relation, which has information about what kind of a relation it's dealing with, and TupleDesc. That should get us pretty far. It would be a modularity violation, for sure, but I could live with that for the purpose of page version conversion. Another issue is that it might not be possible to update a page for lack of space. Are we prepared to assume that there will never be a transformation we need to apply that makes the data bigger? We do need some solution to that. One idea is to run a pre-upgrade script in the old version that scans the database and moves tuples that would no longer fit on their pages in the new version. This could be run before the upgrade, while the old database is still running, so it would be acceptable for that to take some time. No doubt people would prefer something better than that. Another idea would be to have some over-sized buffers that can be used as the target of conversion, until some tuples are moved off to another page. Perhaps the over-sized buffer wouldn't need to be in shared memory, if they're read-only until some tuples are moved. This is pretty hand-wavy, I know. The point is, I don't think these problems are insurmountable. (Likely counterexample: adding collation info to text values.) I doubt it, as collation is not a property of text values, but operations. But that's off-topic... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: 4) Implementation The main point of implementation is to have several version of PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct structure will be handled in special branch (see examples). (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. I agree with Tom's reply. And anyway this approach will be mostly isolated into page.c and you need to able read old page in both cases. Note that you need to handle not only page header changes, but changes to internal representations of different data types, and changes like varvarlen and combocid. Those are things that have happened in the past; in the future, I'm foreseeing changes to the toast header, for example, as there's been a lot of ideas related to toast options compression. I know, this is a first small step for inplace upgrade. Tupleheader will follow. Page structure is basic. I want to split development into small steps, because it is easy to review. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Heikki Linnakangas napsal(a): Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: (this won't come as a surprise as we talked about this in PGCon, but) I think we should rather convert the page structure to new format in ReadBuffer the first time a page is read in. That would keep the changes a lot more isolated. The problem is that ReadBuffer is an extremely low-level environment, and it's not clear that it's possible (let alone practical) to do a conversion at that level in every case. Well, we can't predict the future, and can't guarantee that it's possible or practical to do the things we need to do in the future no matter what approach we choose. In particular it hardly seems sane to expect ReadBuffer to do tuple content conversion, which is going to be practically impossible to perform without any catalog accesses. ReadBuffer has access to Relation, which has information about what kind of a relation it's dealing with, and TupleDesc. That should get us pretty far. It would be a modularity violation, for sure, but I could live with that for the purpose of page version conversion. But if you look for example into hash implementation some pages are not in regular format and conversion could need more information which we do not have to have in ReadBuffer. Another issue is that it might not be possible to update a page for lack of space. Are we prepared to assume that there will never be a transformation we need to apply that makes the data bigger? We do need some solution to that. One idea is to run a pre-upgrade script in the old version that scans the database and moves tuples that would no longer fit on their pages in the new version. This could be run before the upgrade, while the old database is still running, so it would be acceptable for that to take some time. It could not work for indexes and do not forget TOAST chunks. I think in some cases you can get unused quoter of each page in TOAST table. No doubt people would prefer something better than that. Another idea would be to have some over-sized buffers that can be used as the target of conversion, until some tuples are moved off to another page. Perhaps the over-sized buffer wouldn't need to be in shared memory, if they're read-only until some tuples are moved. Anyway, you need mechanism how to mark that this page is read only which is also require a lot of modification. And some mechanism how to make a decision when this page converted. I guess this approach will require similar modification as convert on write. This is pretty hand-wavy, I know. The point is, I don't think these problems are insurmountable. (Likely counterexample: adding collation info to text values.) I doubt it, as collation is not a property of text values, but operations. But that's off-topic... Yes, it is offtopic, however I think Tom is right :-). Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Tom Lane [EMAIL PROTECTED] writes: (Likely counterexample: adding collation info to text values.) I don't think the argument really needs an example, but I would be pretty upset if we proposed tagging every text datum with a collation. Encoding perhaps, though that seems like a bad idea to me on performance grounds, but collation is not a property of the data at all. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Multiversion page api (inplace upgrade)
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: (Likely counterexample: adding collation info to text values.) I don't think the argument really needs an example, but I would be pretty upset if we proposed tagging every text datum with a collation. Encoding perhaps, though that seems like a bad idea to me on performance grounds, but collation is not a property of the data at all. Again not directly related to difficulties upgrading pages... The recent discussion ... http://archives.postgresql.org/pgsql-hackers/2008-06/msg00102.php ... mentions keeping collation information together with text data, however it is referring to keeping it together when processing it, not when storing the text. Regards, Stephen Denne. -- At the Datamail Group we value teamwork, respect, achievement, client focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by replying immediately, destroy it and do not copy, disclose or use it in any way. The Datamail Group, through our GoGreen programme, is committed to environmental sustainability. Help us in our efforts by not printing this email. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers