Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
This entry has been waiting on author input for a while (our current threshold is roughly two weeks), so I've marked it Returned with Feedback. Once you think the patchset is ready for review again, you (or any interested party) can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/3604/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob
Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
I think this patch is missing "SET [UN]LOGGED", defaults of identity columns and domains, and access method. And tablespace, even though that rewrites the *files*, but not tuples (maybe these docs should say that).
Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
On Fri, 1 Apr 2022 at 16:10, James Coleman wrote: > > On Thu, Mar 31, 2022 at 10:58 AM Matthias van de Meent > wrote: > > > > On Tue, 29 Mar 2022 at 16:20, James Coleman wrote: > > > > > > Over in the "Document atthasmissing default optimization avoids > > > verification table scan" thread David Johnston (who I've cc'd) > > > suggested that my goals might be better implemented with a simple > > > restructuring of the Notes section of the ALTER TABLE docs. I think > > > this is also along the lines of Tom Lane's suggestion of a "unified > > > discussion", but I've chosen for now (and simplicity's sake) not to > > > break this into an entirely new page. If reviewers feel that is > > > warranted at this stage, I can do that, but it seems to me that for > > > now this improves the structure and sets us up for such a future page > > > but falls short of sufficient content to move into its own page. > > > > > > One question on the changes: the current docs say "when attaching a > > > new partition it may be scanned to verify that existing rows meet the > > > partition constraint". The word "may" there seems to suggest there may > > > also be occasions where scans are not needed, but no examples of such > > > cases are present. I'm not immediately aware of such a case. Are these > > > constraints always validated? If not, in which cases can such a scan > > > be skipped? > > > > > > I've also incorporated the slight correction in "Correct docs re: > > > rewriting indexes when table rewrite is skipped" [2] here, and will > > > rebase this patch if that gets committed. > > > > See comments in that thread. > > Rebased since that thread has now resulted in a committed patch. > > > > +Changing the type of an existing column will require the entire > > > table and its > > > +indexes to be rewritten. As an exception, if the > > > USING clause > > > +does not change the column contents and the old type is either > > > binary coercible > > > +to the new type or an unconstrained domain over the new type, a > > > table rewrite is > > > +not needed. > > > > This implies "If the old type is [...] an unconstrained domain over > > the new type, a table rewrite is not needed.", which is the wrong way > > around. > > > > I'd go with something along the lines of: > > > > +Changing the type of an existing column will require the entire table > > to be > > +rewritten, unless the USING clause is only a > > binary coercible > > +cast, or if the new type is an unconstrained > > DOMAIN over the > > +old type. > > That language is actually unchanged from the existing docs; is there > an error in the existing docs you're seeing? I'm actually imagining > that it can probably got either way -- from unconstrained domain over > new type to new type or from old type to unconstrained domain over old > type. CREATE DOMAIN constrained AS text NOT NULL; CREATE DOMAIN unconstrained_on_constrained AS constrained; CREATE TABLE tst (col unconstrained_on_constrained); ALTER TABLE tst ALTER COLUMN col TYPE constrained; -- table scan Moving from an unconstrained domain over a constrained domain means that we still do the table scan. Domain nesting is weird in that way. > > That would drop the reference to index rebuilding; but that should be > > covered in other parts of the docs. > > Part of the whole point of this restructuring is to make both of those > clear; I think we should retain the comments about indexes. OK; I mentioned it because table rewrite also implies index rewrite; assuming this is correctly referenced in other parts of the docs. > > > +The following alterations of the table require the entire table, and > > > in some > > > +cases its indexes as well, to be rewritten. > > > > It is impossible to rewrite the table without at the same time also > > rewriting the indexes; as the location of tuples changes and thus > > previously generated indexes will become invalid. At the same time; > > changes to columns might not require a table rewrite, while still > > requiring the indexes to be rewritten. I suggest changing the order of > > "table" and "index", or dropping the clause. > > Ah, that's a good point. I've rewritten that part. > > > > +[...] For a large table such a rewrite > > > +may take a significant amount of time and will temporarily require > > > as much as > > > +double the disk space. > > > > I'd replace the will with could. Technically, this "double the disk > > space" could be even higher than that; due to index rebuilds taking up > > to 3x normal space (one original index which is only dropped at the > > end, one sorted tuple store for the rebuild, and one new index). > > That's also the existing language, but I agree it seems a bit overly > precise (and in the process probably incorrect). There's a lot of > complexity here: depending on the type change (and USING clause!) and > table width it could be even more than 3x. I've reworded to try to > capture
Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
On Thu, Mar 31, 2022 at 10:58 AM Matthias van de Meent wrote: > > On Tue, 29 Mar 2022 at 16:20, James Coleman wrote: > > > > Over in the "Document atthasmissing default optimization avoids > > verification table scan" thread David Johnston (who I've cc'd) > > suggested that my goals might be better implemented with a simple > > restructuring of the Notes section of the ALTER TABLE docs. I think > > this is also along the lines of Tom Lane's suggestion of a "unified > > discussion", but I've chosen for now (and simplicity's sake) not to > > break this into an entirely new page. If reviewers feel that is > > warranted at this stage, I can do that, but it seems to me that for > > now this improves the structure and sets us up for such a future page > > but falls short of sufficient content to move into its own page. > > > > One question on the changes: the current docs say "when attaching a > > new partition it may be scanned to verify that existing rows meet the > > partition constraint". The word "may" there seems to suggest there may > > also be occasions where scans are not needed, but no examples of such > > cases are present. I'm not immediately aware of such a case. Are these > > constraints always validated? If not, in which cases can such a scan > > be skipped? > > > > I've also incorporated the slight correction in "Correct docs re: > > rewriting indexes when table rewrite is skipped" [2] here, and will > > rebase this patch if that gets committed. > > See comments in that thread. Rebased since that thread has now resulted in a committed patch. > > +Changing the type of an existing column will require the entire table > > and its > > +indexes to be rewritten. As an exception, if the > > USING clause > > +does not change the column contents and the old type is either binary > > coercible > > +to the new type or an unconstrained domain over the new type, a table > > rewrite is > > +not needed. > > This implies "If the old type is [...] an unconstrained domain over > the new type, a table rewrite is not needed.", which is the wrong way > around. > > I'd go with something along the lines of: > > +Changing the type of an existing column will require the entire table to > be > +rewritten, unless the USING clause is only a > binary coercible > +cast, or if the new type is an unconstrained > DOMAIN over the > +old type. That language is actually unchanged from the existing docs; is there an error in the existing docs you're seeing? I'm actually imagining that it can probably got either way -- from unconstrained domain over new type to new type or from old type to unconstrained domain over old type. > That would drop the reference to index rebuilding; but that should be > covered in other parts of the docs. Part of the whole point of this restructuring is to make both of those clear; I think we should retain the comments about indexes. > > +The following alterations of the table require the entire table, and > > in some > > +cases its indexes as well, to be rewritten. > > It is impossible to rewrite the table without at the same time also > rewriting the indexes; as the location of tuples changes and thus > previously generated indexes will become invalid. At the same time; > changes to columns might not require a table rewrite, while still > requiring the indexes to be rewritten. I suggest changing the order of > "table" and "index", or dropping the clause. Ah, that's a good point. I've rewritten that part. > > +[...] For a large table such a rewrite > > +may take a significant amount of time and will temporarily require as > > much as > > +double the disk space. > > I'd replace the will with could. Technically, this "double the disk > space" could be even higher than that; due to index rebuilds taking up > to 3x normal space (one original index which is only dropped at the > end, one sorted tuple store for the rebuild, and one new index). That's also the existing language, but I agree it seems a bit overly precise (and in the process probably incorrect). There's a lot of complexity here: depending on the type change (and USING clause!) and table width it could be even more than 3x. I've reworded to try to capture what's really going on here. Why "could" instead of "will"? All table rewrites will always require a extra disk space, right? > > -Similarly, when attaching a new partition it may be scanned to verify > > that > > -existing rows meet the partition constraint. > > +Attaching a new partition requires scanning the table to verify that > > existing > > +rows meet the partition constraint. > > This is also (and better!) documented under section > sql-altertable-attach-partition: we will skip full table scan if the > table partition's existing constraints already imply the new partition > constraints. The previous wording is better in that regard ("may > need", instead of "requires"), though it could be
Re: Restructure ALTER TABLE notes to clarify table rewrites and verification scans
On Tue, 29 Mar 2022 at 16:20, James Coleman wrote: > > Over in the "Document atthasmissing default optimization avoids > verification table scan" thread David Johnston (who I've cc'd) > suggested that my goals might be better implemented with a simple > restructuring of the Notes section of the ALTER TABLE docs. I think > this is also along the lines of Tom Lane's suggestion of a "unified > discussion", but I've chosen for now (and simplicity's sake) not to > break this into an entirely new page. If reviewers feel that is > warranted at this stage, I can do that, but it seems to me that for > now this improves the structure and sets us up for such a future page > but falls short of sufficient content to move into its own page. > > One question on the changes: the current docs say "when attaching a > new partition it may be scanned to verify that existing rows meet the > partition constraint". The word "may" there seems to suggest there may > also be occasions where scans are not needed, but no examples of such > cases are present. I'm not immediately aware of such a case. Are these > constraints always validated? If not, in which cases can such a scan > be skipped? > > I've also incorporated the slight correction in "Correct docs re: > rewriting indexes when table rewrite is skipped" [2] here, and will > rebase this patch if that gets committed. See comments in that thread. > +Changing the type of an existing column will require the entire table > and its > +indexes to be rewritten. As an exception, if the > USING clause > +does not change the column contents and the old type is either binary > coercible > +to the new type or an unconstrained domain over the new type, a table > rewrite is > +not needed. This implies "If the old type is [...] an unconstrained domain over the new type, a table rewrite is not needed.", which is the wrong way around. I'd go with something along the lines of: +Changing the type of an existing column will require the entire table to be +rewritten, unless the USING clause is only a binary coercible +cast, or if the new type is an unconstrained DOMAIN over the +old type. That would drop the reference to index rebuilding; but that should be covered in other parts of the docs. > +The following alterations of the table require the entire table, and in > some > +cases its indexes as well, to be rewritten. It is impossible to rewrite the table without at the same time also rewriting the indexes; as the location of tuples changes and thus previously generated indexes will become invalid. At the same time; changes to columns might not require a table rewrite, while still requiring the indexes to be rewritten. I suggest changing the order of "table" and "index", or dropping the clause. > +[...] For a large table such a rewrite > +may take a significant amount of time and will temporarily require as > much as > +double the disk space. I'd replace the will with could. Technically, this "double the disk space" could be even higher than that; due to index rebuilds taking up to 3x normal space (one original index which is only dropped at the end, one sorted tuple store for the rebuild, and one new index). > -Similarly, when attaching a new partition it may be scanned to verify > that > -existing rows meet the partition constraint. > +Attaching a new partition requires scanning the table to verify that > existing > +rows meet the partition constraint. This is also (and better!) documented under section sql-altertable-attach-partition: we will skip full table scan if the table partition's existing constraints already imply the new partition constraints. The previous wording is better in that regard ("may need", instead of "requires"), though it could be improved by refering to the sql-altertable-attach-partition section. Kind regards, Matthias van de Meent
Restructure ALTER TABLE notes to clarify table rewrites and verification scans
Over in the "Document atthasmissing default optimization avoids verification table scan" thread David Johnston (who I've cc'd) suggested that my goals might be better implemented with a simple restructuring of the Notes section of the ALTER TABLE docs. I think this is also along the lines of Tom Lane's suggestion of a "unified discussion", but I've chosen for now (and simplicity's sake) not to break this into an entirely new page. If reviewers feel that is warranted at this stage, I can do that, but it seems to me that for now this improves the structure and sets us up for such a future page but falls short of sufficient content to move into its own page. One question on the changes: the current docs say "when attaching a new partition it may be scanned to verify that existing rows meet the partition constraint". The word "may" there seems to suggest there may also be occasions where scans are not needed, but no examples of such cases are present. I'm not immediately aware of such a case. Are these constraints always validated? If not, in which cases can such a scan be skipped? I've also incorporated the slight correction in "Correct docs re: rewriting indexes when table rewrite is skipped" [2] here, and will rebase this patch if that gets committed. Thanks, James Coleman 1: https://www.postgresql.org/message-id/CAKFQuwZyBaJjNepdTM3kO8PLaCpRdRd8%2BmtLT8QdE73oAsGv8Q%40mail.gmail.com 2: https://www.postgresql.org/message-id/CAAaqYe90Ea3RG%3DA7H-ONvTcx549-oQhp07BrHErwM%3DAyH2ximg%40mail.gmail.com v1-0001-Restructure-ALTER-TABLE-notes-to-clarify-table-re.patch Description: Binary data