Re: Document atthasmissing default optimization avoids verification table scan

2022-03-29 Thread James Coleman
On Sun, Mar 27, 2022 at 11:12 PM David G. Johnston
 wrote:
>
> On Sun, Mar 27, 2022 at 11:17 AM James Coleman  wrote:
>>
>> Hmm,  I didn't realize that was project policy,
>
>
> Guideline/Rule of Thumb is probably a better concept.

Ah, OK, thanks.

>>
>> but I'm a bit
>> surprised given that the sentence which 0001 replaces seems like a
>> direct violation of that also: "In neither case is a rewrite of the
>> table required."
>>
>
> IMO mostly due to the structuring of the paragraphs; something like the 
> following makes it less problematic (and as shown below may be sufficient to 
> address the purpose of this patch)
>
> """
> [...]
> The following alterations of the table require the entire table, and/or its 
> indexes, to be rewritten; which may take a significant amount of time for a 
> large table, and will temporarily require as much as double the disk space.
>
> Changing the type of an existing column will require the entire table and its 
> indexes to be rewritten. As an exception, if the USING clause does not change 
> the column contents and the old type is either binary coercible to the new 
> type, or an unconstrained domain over the new type, a table rewrite is not 
> needed; but any indexes on the affected columns must still be rewritten.
>
> Adding a column with a volatile DEFAULT also requires the entire table and 
> its indexes to be rewritten.
>
> The reason a non-volatile (or absent) DEFAULT does not require a rewrite of 
> the table is because the DEFAULT expression (or NULL) is evaluated at the 
> time of the statement and the result is stored in the table's metadata.
>
> The following alterations of the table require that it be scanned in its 
> entirety to ensure that no existing values are contrary to the new 
> constraints placed on the table.  Constraints backed by indexes will scan the 
> table as a side-effect of populating the new index with data.
>
> Adding a CHECK constraint requires scanning the table to verify that existing 
> rows meet the constraint.  The same goes for adding a NOT NULL constraint to 
> an existing column.
>
> A newly attached partition requires scanning the table to verify that 
> existing rows meet the partition constraint.
>
> A foreign key constraint requires scanning the table to verify that all 
> existing values exist on the referenced table.
>
> The main reason for providing the option to specify multiple changes in a 
> single ALTER TABLE is that multiple table scans or rewrites can thereby be 
> combined into a single pass over the table.
>
> Scanning a large table to verify a new constraint can take a long time, and 
> other updates to the table are locked out until the ALTER TABLE ADD 
> CONSTRAINT command is committed. For CHECK and FOREIGN KEY constraints there 
> is an option, NOT VALID, that reduces the impact of adding a constraint on 
> concurrent updates. With NOT VALID, the ADD CONSTRAINT command does not scan 
> the table and can be committed immediately. After that, a VALIDATE CONSTRAINT 
> command can be issued to verify that existing rows satisfy the constraint. 
> The validation step does not need to lock out concurrent updates, since it 
> knows that other transactions will be enforcing the constraint for rows that 
> they insert or update; only pre-existing rows need to be checked. Hence, 
> validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being 
> altered. (If the constraint is a foreign key then a ROW SHARE lock is also 
> required on the table referenced by the constraint.) In addition to improving 
> concurrency, it can be useful to use NOT VALID and VALIDATE CONSTRAINT in 
> cases where the table is known to contain pre-existing violations. Once the 
> constraint is in place, no new violations can be inserted, and the existing 
> problems can be corrected at leisure until VALIDATE CONSTRAINT finally 
> succeeds.
>
> The DROP COLUMN form does not physically remove the column, but simply makes 
> it invisible to SQL operations. Subsequent insert and update operations in 
> the table will store a null value for the column. Thus, dropping a column is 
> quick but it will not immediately reduce the on-disk size of your table, as 
> the space occupied by the dropped column is not reclaimed. The space will be 
> reclaimed over time as existing rows are updated.
>
> To force immediate reclamation of space occupied by a dropped column, you can 
> execute one of the forms of ALTER TABLE that performs a rewrite of the whole 
> table. This results in reconstructing each row with the dropped column 
> replaced by a null value.
>
> The rewriting forms of ALTER TABLE are not MVCC-safe. After a table rewrite, 
> the table will appear empty to concurrent transactions, if they are using a 
> snapshot taken before the rewrite occurred. See Section 13.5 for more details.
> [...]
> """
>
> I'm liking the idea of breaking out multiple features into their own 
> sentences or paragraphs instead of saying:
>
> "Adding a column 

Re: Document atthasmissing default optimization avoids verification table scan

2022-03-28 Thread Robert Haas
On Mon, Mar 28, 2022 at 9:54 AM James Coleman  wrote:
> No, I've appreciated constructive feedback from both Tom and David on
> this thread. Your original email was so incredibly strongly worded
> (and contained no constructive recommendations about a better path
> forward, unlike Tom's and David's replies), and I had a hard time
> understanding what could possibly have made you that irritated with a
> proposal to document how to avoid long-running table scans while
> holding an exclusive lock.

I don't think I was particularly irritated then, but I admit I'm
getting irritated now. I clearly said that the documentation wasn't
perfect but that I didn't think these patches made it better, and I
explained why in some detail. It's not like I said "you suck and I
hate you and please go die in a fire" or something like that. So why
is that "incredibly strongly worded"? Especially when both David and
Tom agreed with my recommendation that we reject these patches as
proposed?

There are probably patches in this CommitFest that have gotten no
review from anyone, but it's pretty hard to find them, because the
CommitFest is full of patches like this one, which have been reviewed
fairly extensively yet which, for one reason or another, don't seem
likely to go anywhere any time soon. I think that's a much bigger
problem for the project than the lack of documentation on this
particular issue. Of course, you will likely disagree.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-28 Thread James Coleman
On Mon, Mar 28, 2022 at 9:30 AM Robert Haas  wrote:
>
> On Sun, Mar 27, 2022 at 1:00 PM James Coleman  wrote:
> > So "undocumented concept" is just not accurate, and so I don't see it
> > as a valid reason to reject the patch.
>
> I mean, I think it's pretty accurate. The fact that you can point to a
> few uses of the terms "table rewrite" and "table scan" in the ALTER
> TABLE documentation doesn't prove that those terms are defined there
> or systematically discussed and it seems pretty clear to me that they
> are not. And I don't even know what we're arguing about here, because
> elsewhere in the same email you agree that it is reasonable to
> critique the patch on the basis of how well it fits into the
> documentation and at least for me that is precisely this issue.
>
> I think the bottom line here is that you're not prepared to accept as
> valid any opinion to the effect that we shouldn't commit these
> patches. But that remains my opinion.

No, I've appreciated constructive feedback from both Tom and David on
this thread. Your original email was so incredibly strongly worded
(and contained no constructive recommendations about a better path
forward, unlike Tom's and David's replies), and I had a hard time
understanding what could possibly have made you that irritated with a
proposal to document how to avoid long-running table scans while
holding an exclusive lock.

The two patches you reviewed aren't the current state of this
proposal; I'll continue working on revising to reviewers replies, and
as either a replacement or follow-on for this I like Tom's idea of
having a comprehensive guide (which I think has been needed for quite
some time).

Thanks,
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-28 Thread Robert Haas
On Sun, Mar 27, 2022 at 1:00 PM James Coleman  wrote:
> So "undocumented concept" is just not accurate, and so I don't see it
> as a valid reason to reject the patch.

I mean, I think it's pretty accurate. The fact that you can point to a
few uses of the terms "table rewrite" and "table scan" in the ALTER
TABLE documentation doesn't prove that those terms are defined there
or systematically discussed and it seems pretty clear to me that they
are not. And I don't even know what we're arguing about here, because
elsewhere in the same email you agree that it is reasonable to
critique the patch on the basis of how well it fits into the
documentation and at least for me that is precisely this issue.

I think the bottom line here is that you're not prepared to accept as
valid any opinion to the effect that we shouldn't commit these
patches. But that remains my opinion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-27 Thread David G. Johnston
On Sun, Mar 27, 2022 at 11:17 AM James Coleman  wrote:

> Hmm,  I didn't realize that was project policy,


Guideline/Rule of Thumb is probably a better concept.


> but I'm a bit
> surprised given that the sentence which 0001 replaces seems like a
> direct violation of that also: "In neither case is a rewrite of the
> table required."
>
>
IMO mostly due to the structuring of the paragraphs; something like the
following makes it less problematic (and as shown below may be
sufficient to address the purpose of this patch)

"""
[...]
The following alterations of the table require the entire table, and/or its
indexes, to be rewritten; which may take a significant amount of time for a
large table, and will temporarily require as much as double the disk space.

Changing the type of an existing column will require the entire table and
its indexes to be rewritten. As an exception, if the USING clause does not
change the column contents and the old type is either binary coercible to
the new type, or an unconstrained domain over the new type, a table rewrite
is not needed; but any indexes on the affected columns must still be
rewritten.

Adding a column with a volatile DEFAULT also requires the entire table and
its indexes to be rewritten.

The reason a non-volatile (or absent) DEFAULT does not require a rewrite of
the table is because the DEFAULT expression (or NULL) is evaluated at the
time of the statement and the result is stored in the table's metadata.

The following alterations of the table require that it be scanned in its
entirety to ensure that no existing values are contrary to the new
constraints placed on the table.  Constraints backed by indexes will scan
the table as a side-effect of populating the new index with data.

Adding a CHECK constraint requires scanning the table to verify that
existing rows meet the constraint.  The same goes for adding a NOT NULL
constraint to an existing column.

A newly attached partition requires scanning the table to verify that
existing rows meet the partition constraint.

A foreign key constraint requires scanning the table to verify that all
existing values exist on the referenced table.

The main reason for providing the option to specify multiple changes in a
single ALTER TABLE is that multiple table scans or rewrites can thereby be
combined into a single pass over the table.

Scanning a large table to verify a new constraint can take a long time, and
other updates to the table are locked out until the ALTER TABLE ADD
CONSTRAINT command is committed. For CHECK and FOREIGN KEY constraints
there is an option, NOT VALID, that reduces the impact of adding a
constraint on concurrent updates. With NOT VALID, the ADD CONSTRAINT
command does not scan the table and can be committed immediately. After
that, a VALIDATE CONSTRAINT command can be issued to verify that existing
rows satisfy the constraint. The validation step does not need to lock out
concurrent updates, since it knows that other transactions will be
enforcing the constraint for rows that they insert or update; only
pre-existing rows need to be checked. Hence, validation acquires only a
SHARE UPDATE EXCLUSIVE lock on the table being altered. (If the constraint
is a foreign key then a ROW SHARE lock is also required on the table
referenced by the constraint.) In addition to improving concurrency, it can
be useful to use NOT VALID and VALIDATE CONSTRAINT in cases where the table
is known to contain pre-existing violations. Once the constraint is in
place, no new violations can be inserted, and the existing problems can be
corrected at leisure until VALIDATE CONSTRAINT finally succeeds.

The DROP COLUMN form does not physically remove the column, but simply
makes it invisible to SQL operations. Subsequent insert and update
operations in the table will store a null value for the column. Thus,
dropping a column is quick but it will not immediately reduce the on-disk
size of your table, as the space occupied by the dropped column is not
reclaimed. The space will be reclaimed over time as existing rows are
updated.

To force immediate reclamation of space occupied by a dropped column, you
can execute one of the forms of ALTER TABLE that performs a rewrite of the
whole table. This results in reconstructing each row with the dropped
column replaced by a null value.

The rewriting forms of ALTER TABLE are not MVCC-safe. After a table
rewrite, the table will appear empty to concurrent transactions, if they
are using a snapshot taken before the rewrite occurred. See Section 13.5
for more details.
[...]
"""

I'm liking the idea of breaking out multiple features into their own
sentences or paragraphs instead of saying:

"Adding a column with a volatile DEFAULT or changing the type of an
existing column"

"Adding a CHECK or NOT NULL constraint"

This later combination probably doesn't catch my attention except for this
discussion and the fact that there are multiple ways to add these
constraints and we might as well be clear 

Re: Document atthasmissing default optimization avoids verification table scan

2022-03-27 Thread James Coleman
On Sun, Mar 27, 2022 at 1:46 PM David G. Johnston
 wrote:
>
> On Sun, Mar 27, 2022 at 10:00 AM James Coleman  wrote:
>>
>> As shown above, table scans (and specifically table scans used to
>> validate constraints, which is what this patch is about) are clearly
>> documented (more than once!) in the ALTER TABLE documentation. In fact
>> it's documented specifically in reference to SET NOT NULL, which is
>> even more specifically the type of constraint this patch is about.
>>
>> So "undocumented concept" is just not accurate, and so I don't see it
>> as a valid reason to reject the patch.
>>
>
> As you point out, where these scans are performed is documented.  Your 
> request, though, is to document a location where they are not performed 
> instead of trusting in the absence of a statement meaning that no such scan 
> happens.  In this case no such scan of the table is ever needed when adding a 
> column and so ADD COLUMN doesn't mention table scanning.  We almost always 
> choose not to document those things which do not happen.  I don't always 
> agree with this position but it is valid and largely adhered to.  On that 
> documentation theory/policy basis alone this patch can be rejected. 0001 as 
> proposed is especially strong in violating this principle.

Hmm,  I didn't realize that was project policy, but I'm a bit
surprised given that the sentence which 0001 replaces seems like a
direct violation of that also: "In neither case is a rewrite of the
table required."

> My two thoughts from yesterday take slightly different approaches to try and 
> mitigate the same misunderstanding while also providing useful guidance to 
> the reader to make sure the hazard of ALTER COLUMN SET NOT NULL is something 
> they are thinking about even when adding a new column since forgetting to 
> incorporate the NOT NULL during the add can be a costly mistake.  The 
> tweaking the notes section seems to be the more productive of the two 
> approaches.

Yes, I like those suggestions. I've attached an updated patch that I
think fits a good bit more naturally into the Notes section
specifically addressing scans and rewrites on NOT NULL constraints.

Thanks for the feedback,
James Coleman


v4-0001-Document-atthasmissing-default-avoids-verificatio.patch
Description: Binary data


Re: Document atthasmissing default optimization avoids verification table scan

2022-03-27 Thread David G. Johnston
On Sun, Mar 27, 2022 at 10:00 AM James Coleman  wrote:

> As shown above, table scans (and specifically table scans used to
> validate constraints, which is what this patch is about) are clearly
> documented (more than once!) in the ALTER TABLE documentation. In fact
> it's documented specifically in reference to SET NOT NULL, which is
> even more specifically the type of constraint this patch is about.
>
> So "undocumented concept" is just not accurate, and so I don't see it
> as a valid reason to reject the patch.
>
>
As you point out, where these scans are performed is documented.  Your
request, though, is to document a location where they are not performed
instead of trusting in the absence of a statement meaning that no such scan
happens.  In this case no such scan of the table is ever needed when adding
a column and so ADD COLUMN doesn't mention table scanning.  We almost
always choose not to document those things which do not happen.  I don't
always agree with this position but it is valid and largely adhered to.  On
that documentation theory/policy basis alone this patch can be rejected.
0001 as proposed is especially strong in violating this principle.

My two thoughts from yesterday take slightly different approaches to try
and mitigate the same misunderstanding while also providing useful guidance
to the reader to make sure the hazard of ALTER COLUMN SET NOT NULL is
something they are thinking about even when adding a new column since
forgetting to incorporate the NOT NULL during the add can be a costly
mistake.  The tweaking the notes section seems to be the more productive of
the two approaches.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-03-27 Thread James Coleman
On Sun, Mar 27, 2022 at 11:43 AM Robert Haas  wrote:
>
> On Sat, Mar 26, 2022 at 6:25 PM James Coleman  wrote:
> > I simply do not accept the claim that this is not a reasonable concern
> > to have nor that this isn't worth documenting.
>
> I don't think I said that the concern wasn't reasonable, but I don't
> think the fact that one person or organization had a concern means
> that it has to be worth documenting.

You said "My first reaction when I read this sentence was that it
was warning the user about the absence of a hazard that no one would
expect in the first place." That seemed to me even stronger than "not
a reasonable concern", and so while I agree that one organization
having a concern doesn't mean that it has to be documented, it does
seem clear to me that one such organization dispels the idea that "no
one would expect [this]", which is why I said it in response to that
statement.

> And I didn't say either that it's
> not intrinsically worth documenting. I said it doesn't fit nicely into
> the documentation we have.

That was not the critique I took away from your email at all. It is,
however, what Tom noted, and I agree it's a relevant question.

> Since you didn't like my last example, let's try another one. If
> someone shows up and proposes a documentation patch to explain what a
> BitmapOr node means, we're probably going to reject it, because it
> makes no sense to document that one node and not all the others. That
> doesn't mean that people shouldn't want to know what BitmapOr means,
> but it's just not sensible to document that one thing in isolation,
> even if somebody somewhere happened to be confused by that thing and
> not any of the other nodes.
>
> In the same way, I think you're trying to jam documentation of one
> particular point into the documentation when there are many other
> similar points that are not documented, and I think it's very awkward.
> It looks to me like you want to document that a table scan isn't
> performed in a certain case when we haven't documented the rule that
> would cause that table scan to be performed in other cases, or even
> what a table scan means in this context, or any of the similar things
> that are equally important, like a table rewrite or an index rebuild,
> or any of the rules for when those things happen.

In the ALTER TABLE docs page "table scan" is used in the section on
"SET NOT NULL", "full table scan" is used in the sections on "ADD
table_constraint_using_index" and "ATTACH PARTITION", and "table scan"
is used again in the "Note" section. Table rewrites are similarly
discussed repeatedly on that page. Indeed the docs make a clear effort
to point out where table scans and table rewrites do and do not occur
(albeit not in one unified place -- it's scattered through the various
subcommands and notes sections. Indeed the Notes section explicitly
say "The main reason for providing the option to specify multiple
changes in a single ALTER TABLE is that multiple table scans or
rewrites can thereby be combined into a single pass over the table."

So I believe it is just factually incorrect to say that "we haven't
documented...what a table scan means in this context, or any of the
similar things that are equally important, like a table rewrite or an
index rebuild, or any of the rules for when those things happen."

> It's arguable in my mind whether it is worth documenting all of those
> rules, although I am not opposed to it if somebody wants to do the
> work. But I *am* opposed to documenting that a certain situation is an
> exception to an undocumented rule about an undocumented concept.
> That's going to create confusion, not dispel it.

As shown above, table scans (and specifically table scans used to
validate constraints, which is what this patch is about) are clearly
documented (more than once!) in the ALTER TABLE documentation. In fact
it's documented specifically in reference to SET NOT NULL, which is
even more specifically the type of constraint this patch is about.

So "undocumented concept" is just not accurate, and so I don't see it
as a valid reason to reject the patch.

Thanks,
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-27 Thread Robert Haas
On Sat, Mar 26, 2022 at 6:25 PM James Coleman  wrote:
> I simply do not accept the claim that this is not a reasonable concern
> to have nor that this isn't worth documenting.

I don't think I said that the concern wasn't reasonable, but I don't
think the fact that one person or organization had a concern means
that it has to be worth documenting. And I didn't say either that it's
not intrinsically worth documenting. I said it doesn't fit nicely into
the documentation we have.

Since you didn't like my last example, let's try another one. If
someone shows up and proposes a documentation patch to explain what a
BitmapOr node means, we're probably going to reject it, because it
makes no sense to document that one node and not all the others. That
doesn't mean that people shouldn't want to know what BitmapOr means,
but it's just not sensible to document that one thing in isolation,
even if somebody somewhere happened to be confused by that thing and
not any of the other nodes.

In the same way, I think you're trying to jam documentation of one
particular point into the documentation when there are many other
similar points that are not documented, and I think it's very awkward.
It looks to me like you want to document that a table scan isn't
performed in a certain case when we haven't documented the rule that
would cause that table scan to be performed in other cases, or even
what a table scan means in this context, or any of the similar things
that are equally important, like a table rewrite or an index rebuild,
or any of the rules for when those things happen.

It's arguable in my mind whether it is worth documenting all of those
rules, although I am not opposed to it if somebody wants to do the
work. But I *am* opposed to documenting that a certain situation is an
exception to an undocumented rule about an undocumented concept.
That's going to create confusion, not dispel it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread David G. Johnston
On Sat, Mar 26, 2022 at 4:36 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > Or, we can leave it where things are and make sure the reader understands
> > there are two paths to having a NOT NULL constraint on the newly added
> > column.  Something like:
>
> > "If you plan on having a NOT NULL constraint on the newly added column
> you
> > should add it as a column constraint during the ADD COLUMN command.  If
> you
> > add it later via ALTER COLUMN SET NOT NULL the table will have to be
> > completely scanned in order to ensure that no null values were inserted."
>
> The first way also requires having a non-null DEFAULT, of course, and
> then also that default value must be a constant (else you end up with
> a table rewrite which is even worse).  This sort of interaction
> between features is why I feel that a separate unified discussion
> is the only reasonable solution.
>
>
The paragraph it is being added to discusses the table rewrite already.
This does nothing to contradict the fact that a table rewrite might still
have to happen.

The goal of this sentence is to tell the user to make sure they don't
forget to add the NOT NULL during the column add so that they don't have to
incur a future table scan by executing ALTER COLUMN SET NOT NULL.

I am assuming that the user understands when a table rewrite has to happen
and that the presence of NOT NULL in the ADD COLUMN doesn't impact that.
And if a table rewrite happens that a table scan happens implicitly.
Admittedly, this doesn't directly address the original complaint, but by
showing how the two commands differ I believe the confusion will go away.
SET NOT NULL performs a scan, ADD COLUMN NOT NULL does not; it just might
require something worse if the supplied default is volatile.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread Tom Lane
"David G. Johnston"  writes:
> Or, we can leave it where things are and make sure the reader understands
> there are two paths to having a NOT NULL constraint on the newly added
> column.  Something like:

> "If you plan on having a NOT NULL constraint on the newly added column you
> should add it as a column constraint during the ADD COLUMN command.  If you
> add it later via ALTER COLUMN SET NOT NULL the table will have to be
> completely scanned in order to ensure that no null values were inserted."

The first way also requires having a non-null DEFAULT, of course, and
then also that default value must be a constant (else you end up with
a table rewrite which is even worse).  This sort of interaction
between features is why I feel that a separate unified discussion
is the only reasonable solution.

regards, tom lane




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread David G. Johnston
On Sat, Mar 26, 2022 at 4:14 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> I would suggest rewriting 0001 to target ALTER COLUMN instead of in the
> generic notes section (in the paragraph beginning "Adding a column with a
> volatile DEFAULT") for the desired clarification.
>
>
Or, we can leave it where things are and make sure the reader understands
there are two paths to having a NOT NULL constraint on the newly added
column.  Something like:

"If you plan on having a NOT NULL constraint on the newly added column you
should add it as a column constraint during the ADD COLUMN command.  If you
add it later via ALTER COLUMN SET NOT NULL the table will have to be
completely scanned in order to ensure that no null values were inserted."

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread David G. Johnston
On Sat, Mar 26, 2022 at 3:25 PM James Coleman  wrote:

> On Fri, Mar 25, 2022 at 4:40 PM Robert Haas  wrote:
> >
> > On Tue, Jan 25, 2022 at 8:49 AM James Coleman  wrote:
> > > Here's a version that looks like that. I'm not convinced it's an
> > > improvement over the previous version: again, I expect more advanced
> > > users to already understand this concept, and I think moving it to the
> > > ALTER TABLE page could very well have the effect of burying i(amidst
> > > the ton of detail on the ALTER TABLE page) concept that would be
> > > useful to learn early on in a tutorial like the DDL page. But if
> > > people really think this is an improvement, then I can acquiesce.
> >
> > I vote for rejecting both of these patches.
> >
> > 0001 adds the following sentence to the documentation: "A NOT
> > NULL constraint may be added to the new column in the same
> > statement without requiring scanning the table to verify the
> > constraint." My first reaction when I read this sentence was that it
> > was warning the user about the absence of a hazard that no one would
> > expect in the first place. We could also document that adding a NOT
> > NULL constraint will not cause your gas tank to catch fire, but nobody
> > was worried about that until we brought it up.
>
> As noted at minimum we (Braintree Payments) feared this hazard. That's
> reasonable because adding a NOT NULL constraint normally requires a
> table scan while holding an exclusive lock. It's fairly obvious why
> someone like us (any anyone who can't have downtime) would be paranoid
> about any possibility of long-running operations under exclusive locks
>
>
Reading the docs again I see:

ALTER TABLE ... ALTER COLUMN ... SET/DROP NOT NULL
"SET NOT NULL may only be applied to a column provided none of the records
in the table contain a NULL value for the column. Ordinarily this is
checked during the ALTER TABLE by scanning the entire table; however, if a
valid CHECK constraint is found which proves no NULL can exist, then the
table scan is skipped."

And the claim is that the reader would read this behavior of the ALTER
COLUMN ... SET NOT NULL command and assume that it might also apply to:

ALTER TABLE ... ADD COLUMN ... DEFAULT NOT NULL

I accept that such an assumption is plausible and worth disabusing
(regardless of my opinion, that is, to my understanding, why this patch is
being introduced).

To that end we should do so in the ALTER COLUMN ... SET NOT NULL section,
not the ADD COLUMN ... DEFAULT NOT NULL (or, specifically, its
corresponding paragraph in the notes section).

I would suggest rewriting 0001 to target ALTER COLUMN instead of in the
generic notes section (in the paragraph beginning "Adding a column with a
volatile DEFAULT") for the desired clarification.

0002, the moving of existing content from DDL to ALTER TABLE, does not have
agreement and the author of this patch isn't behind it.  I'm not inclined
to introduce a patch to push forth the discussion to conclusion myself.  So
for now it should just die.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread James Coleman
On Fri, Mar 25, 2022 at 5:00 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > I vote for rejecting both of these patches.
>
> I see what James is on about here, but I agree that these specific changes
> don't help much.  What would actually be desirable IMO is a separate
> section somewhere explaining the performance characteristics of ALTER
> TABLE.  (We've also kicked around the idea of EXPLAIN for ALTER TABLE,
> but that's a lot more work.)  This could coalesce the parenthetical
> remarks that exist in ddl.sgml as well as alter_table.sgml into
> something a bit more unified and perhaps easier to follow.  In particular,
> it should start by defining what we mean by "table rewrite" and "table
> scan".  I don't recall at the moment whether we define those in multiple
> places or not at all, but as things stand any such discussion would be
> pretty fragmented.
>
> regards, tom lane

I think a unified area discussing pitfalls/performance of ALTER TABLE
seems like a great idea.

That being said: given that "as things stand" that "discussion
[already is] pretty fragmented" is there a place for a simpler
improvement (adding a short explanation of this particular hazard) in
the meantime? I don't mean this specific v4 patch -- just in general
(since the patch can be revised of course).

Thanks,
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread James Coleman
On Fri, Mar 25, 2022 at 4:40 PM Robert Haas  wrote:
>
> On Tue, Jan 25, 2022 at 8:49 AM James Coleman  wrote:
> > Here's a version that looks like that. I'm not convinced it's an
> > improvement over the previous version: again, I expect more advanced
> > users to already understand this concept, and I think moving it to the
> > ALTER TABLE page could very well have the effect of burying i(amidst
> > the ton of detail on the ALTER TABLE page) concept that would be
> > useful to learn early on in a tutorial like the DDL page. But if
> > people really think this is an improvement, then I can acquiesce.
>
> I vote for rejecting both of these patches.
>
> 0001 adds the following sentence to the documentation: "A NOT
> NULL constraint may be added to the new column in the same
> statement without requiring scanning the table to verify the
> constraint." My first reaction when I read this sentence was that it
> was warning the user about the absence of a hazard that no one would
> expect in the first place. We could also document that adding a NOT
> NULL constraint will not cause your gas tank to catch fire, but nobody
> was worried about that until we brought it up.

As noted at minimum we (Braintree Payments) feared this hazard. That's
reasonable because adding a NOT NULL constraint normally requires a
table scan while holding an exclusive lock. It's fairly obvious why
someone like us (any anyone who can't have downtime) would be paranoid
about any possibility of long-running operations under exclusive locks

I realize it's rhetorical flourish, but it hardly seems reasonable to
compare an actual hazard a database could plausibly have (an index it
is an optimization in the code that prevents it from happening -- a
naive implementation would in fact scan the full table on all NOT NULL
constraint additions) with something not at all related to database
(gas tank fires).

I simply do not accept the claim that this is not a reasonable concern
to have nor that this isn't worth documenting. It was worth someone
taking the time to consider as an optimization in the code. And the
consequence of that not having been done could be an outage for an
unsuspecting user. Of all the things we would want to document DDL
that could require executing long operations while holding exclusive
locks seems pretty high on the list.

> I also think that the
> sentence makes the rest of the paragraph harder to understand, because
> the rest of the paragraph is talking about adding a new column with a
> default, and now suddenly we're talking about NOT NULL constraints.

I am, however, happy to hear critiques of the style of the patch or
the best way to document this kind of behavior.

I'm curious though what you'd envision being a better place for this
information. Yes, we're talking about new columns -- that's the
operation under consideration -- but the NOT NULL constraint is part
of the new column definition. I'm not sure where else you would
document something that's a part of adding a new column.

> 0002 moves some advice about adding columns with defaults from one
> part of the documentation to another. Maybe that's a good idea, and
> maybe it isn't, but it also rewords the advice, and in my opinion, the
> new wording is less clear and specific than the existing wording. It
> also changes a sentence that mentions volatile defaults to give a
> specific example of a volatile function -- clock_timestamp -- probably
> because where the documentation was before that function was
> mentioned. However, that sentence seems clear enough as it is and does
> not really need an example.

Adding that example (and, indeed, moving the advice) was per a
previous reviewer's request. So I'm not sure what to do in this
situation -- I'm trying to improve the proposal per reviewer feedback
but there are conflicting reviewers. I suppose we'd need a
tie-breaking reviewer.

Thanks,
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-26 Thread Robert Haas
On Fri, Mar 25, 2022 at 5:00 PM Tom Lane  wrote:
> I see what James is on about here, but I agree that these specific changes
> don't help much.  What would actually be desirable IMO is a separate
> section somewhere explaining the performance characteristics of ALTER
> TABLE.

Sure. If someone wants to do that and bring it to a level of quality
that we could consider committing, I'm fine with that. But I don't
think that has much to do with the patches before us.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-25 Thread David G. Johnston
On Fri, Mar 25, 2022 at 1:40 PM Robert Haas  wrote:

> On Tue, Jan 25, 2022 at 8:49 AM James Coleman  wrote:
> > Here's a version that looks like that. I'm not convinced it's an
> > improvement over the previous version: again, I expect more advanced
> > users to already understand this concept, and I think moving it to the
> > ALTER TABLE page could very well have the effect of burying i(amidst
> > the ton of detail on the ALTER TABLE page) concept that would be
> > useful to learn early on in a tutorial like the DDL page. But if
> > people really think this is an improvement, then I can acquiesce.
>
> I vote for rejecting both of these patches.
>
> 0001 adds the following sentence to the documentation: "A NOT
> NULL constraint may be added to the new column in the same
> statement without requiring scanning the table to verify the
> constraint." My first reaction when I read this sentence was that it
> was warning the user about the absence of a hazard that no one would
> expect in the first place.


I agree.  The wording that would make one even consider this has yet to
have been introduced at this point in the documentation.


> 0002 moves some advice about adding columns with defaults from one
> part of the documentation to another. Maybe that's a good idea, and
> maybe it isn't, but it also rewords the advice, and in my opinion, the
> new wording is less clear and specific than the existing wording.


In the passing time I've had to directly reference the DDL chapter (which
is a mix of reference material and tutorial) on numerous items so my desire
to move the commentary away from here is less, but still I feel that the
command reference page is the correct place for this kind of detail.

If we took away too much info and made things less clear let's address
that.  It can't be that much, we are talking about basically a paragraph of
text here.


> It
> also changes a sentence that mentions volatile defaults to give a
> specific example of a volatile function -- clock_timestamp -- probably
> because where the documentation was before that function was
> mentioned.  However, that sentence seems clear enough as it is and does

not really need an example.
>

Nope, the usage and context in the patch is basically the same as the
existing usage and context.
David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-03-25 Thread Tom Lane
Robert Haas  writes:
> I vote for rejecting both of these patches.

I see what James is on about here, but I agree that these specific changes
don't help much.  What would actually be desirable IMO is a separate
section somewhere explaining the performance characteristics of ALTER
TABLE.  (We've also kicked around the idea of EXPLAIN for ALTER TABLE,
but that's a lot more work.)  This could coalesce the parenthetical
remarks that exist in ddl.sgml as well as alter_table.sgml into
something a bit more unified and perhaps easier to follow.  In particular,
it should start by defining what we mean by "table rewrite" and "table
scan".  I don't recall at the moment whether we define those in multiple
places or not at all, but as things stand any such discussion would be
pretty fragmented.

regards, tom lane




Re: Document atthasmissing default optimization avoids verification table scan

2022-03-25 Thread Robert Haas
On Tue, Jan 25, 2022 at 8:49 AM James Coleman  wrote:
> Here's a version that looks like that. I'm not convinced it's an
> improvement over the previous version: again, I expect more advanced
> users to already understand this concept, and I think moving it to the
> ALTER TABLE page could very well have the effect of burying i(amidst
> the ton of detail on the ALTER TABLE page) concept that would be
> useful to learn early on in a tutorial like the DDL page. But if
> people really think this is an improvement, then I can acquiesce.

I vote for rejecting both of these patches.

0001 adds the following sentence to the documentation: "A NOT
NULL constraint may be added to the new column in the same
statement without requiring scanning the table to verify the
constraint." My first reaction when I read this sentence was that it
was warning the user about the absence of a hazard that no one would
expect in the first place. We could also document that adding a NOT
NULL constraint will not cause your gas tank to catch fire, but nobody
was worried about that until we brought it up. I also think that the
sentence makes the rest of the paragraph harder to understand, because
the rest of the paragraph is talking about adding a new column with a
default, and now suddenly we're talking about NOT NULL constraints.

0002 moves some advice about adding columns with defaults from one
part of the documentation to another. Maybe that's a good idea, and
maybe it isn't, but it also rewords the advice, and in my opinion, the
new wording is less clear and specific than the existing wording. It
also changes a sentence that mentions volatile defaults to give a
specific example of a volatile function -- clock_timestamp -- probably
because where the documentation was before that function was
mentioned. However, that sentence seems clear enough as it is and does
not really need an example.

I am not trying to pretend that all of our documentation in this area
is totally ideal or that nothing can be done to make it better.
However, I don't think that these particular patches actually make it
better. And I also think that there's only so much time that is worth
spending on a patch set like this. Not everything that is confusing
about the system is ever going to make its way into the documentation,
and that would remain true even if we massively expanded the level of
detail that we put in there. That doesn't mean that James or anyone
else shouldn't suggest things to add as they find things that they
think should be added, but it does mean that not every such suggestion
is going to get traction and that's OK too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-25 Thread James Coleman
On Sat, Jan 22, 2022 at 10:28 AM David G. Johnston
 wrote:
>
>
>
> On Saturday, January 22, 2022, James Coleman  wrote:
>>
>> On Sat, Jan 22, 2022 at 12:35 AM David G. Johnston
>>  wrote:
>> >
>> > On Fri, Jan 21, 2022 at 5:14 PM James Coleman  wrote:
>> >>
>> >>
>> >> > Really?  That's horrid, because that's directly useful advice.
>> >>
>> >> Remedied, but rewritten a bit to better fit with the new style/goal of
>> >> that tip).
>> >>
>> >> Version 3 is attached.
>> >>
>> >
>> > Coming back to this after a respite I think the tip needs to be moved just 
>> > like everything else.  For much the same reason (though this may only be a 
>> > personal bias), I know what SQL Commands do the various things that DDL 
>> > encompasses (especially the basics like adding a column) and so the DDL 
>> > section is really just a tutorial-like chapter that I will generally 
>> > forget about because I will go straight to the official source which is 
>> > the SQL Command Reference.  My future self would want the tip to show up 
>> > there.  If we put the tip after the existing paragraph that starts: 
>> > "Adding a column with a volatile DEFAULT or changing the type of an 
>> > existing column..." the need to specify an example function in the tip 
>> > goes away - though maybe it should be moved to the notes paragraph 
>> > instead: "with a volatile DEFAULT (e.g., clock_timestamp()) or  changing 
>> > the type of an existing column..."
>>
>> In my mind that actually might be a reason to keep it that way. I
>> expect someone who's somewhat experienced to know there are things
>> (like table rewrites and scans) you need to consider and therefore go
>> to the ALTER TABLE page and read the details. But for someone newer
>> the tutorial page needs to introduce them to the idea that those
>> gotchas exist.
>>
>
> Readers of the DDL page are given a hint of the issues and directed to 
> additional, arguably mandatory, reading.  They can not worry about the 
> nuances during their learning phase but instead can defer that reading until 
> they actually have need to alter a (large) table.  But expecting them to read 
> the command reference page is reasonable and is IMO the more probable place 
> they will look when they start doing stuff in earnest.  For the inexperienced 
> reader breaking this up in this manner based upon depth of detail feels right 
> to me.

Here's a version that looks like that. I'm not convinced it's an
improvement over the previous version: again, I expect more advanced
users to already understand this concept, and I think moving it to the
ALTER TABLE page could very well have the effect of burying i(amidst
the ton of detail on the ALTER TABLE page) concept that would be
useful to learn early on in a tutorial like the DDL page. But if
people really think this is an improvement, then I can acquiesce.

Thanks,
James Coleman
From ffca825ca27cffc70c7eb39385545a76fa0d9e2d Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Fri, 24 Sep 2021 09:59:27 -0400
Subject: [PATCH v4 1/2] Document atthasmissing default avoids verification
 table scan

When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
default value without rewriting the table the doc changes did not note
how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
Since adding a NOT NULL constraint requires a verification table scan to
ensure no values are null, users want to know that the combined
operation also avoids the table scan.
---
 doc/src/sgml/ref/alter_table.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index a76e2e7322..1dde16fa39 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1355,7 +1355,9 @@ WITH ( MODULUS numeric_literal, REM
 evaluated at the time of the statement and the result stored in the
 table's metadata.  That value will be used for the column for all existing
 rows.  If no DEFAULT is specified, NULL is used.  In
-neither case is a rewrite of the table required.
+neither case is a rewrite of the table required.  A NOT NULL
+constraint may be added to the new column in the same statement without
+requiring scanning the table to verify the constraint.

 

-- 
2.17.1

From b1019154c749991a3e23cd8e5f82b31acbbdddc9 Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Fri, 21 Jan 2022 18:50:39 +
Subject: [PATCH v4 2/2] Don't double document ADD COLUMN optimization details

Instead point people to the source of truth. This also avoids using
different language ("constant" versus "non-volatile").
---
 doc/src/sgml/ddl.sgml | 22 --
 doc/src/sgml/ref/alter_table.sgml | 11 ++-
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 22f6c5c7ab..0ec1b7cd39 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1406,24 +1406,10 

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-22 Thread David G. Johnston
On Saturday, January 22, 2022, James Coleman  wrote:

> On Sat, Jan 22, 2022 at 12:35 AM David G. Johnston
>  wrote:
> >
> > On Fri, Jan 21, 2022 at 5:14 PM James Coleman  wrote:
> >>
> >>
> >> > Really?  That's horrid, because that's directly useful advice.
> >>
> >> Remedied, but rewritten a bit to better fit with the new style/goal of
> >> that tip).
> >>
> >> Version 3 is attached.
> >>
> >
> > Coming back to this after a respite I think the tip needs to be moved
> just like everything else.  For much the same reason (though this may only
> be a personal bias), I know what SQL Commands do the various things that
> DDL encompasses (especially the basics like adding a column) and so the DDL
> section is really just a tutorial-like chapter that I will generally forget
> about because I will go straight to the official source which is the SQL
> Command Reference.  My future self would want the tip to show up there.  If
> we put the tip after the existing paragraph that starts: "Adding a column
> with a volatile DEFAULT or changing the type of an existing column..." the
> need to specify an example function in the tip goes away - though maybe it
> should be moved to the notes paragraph instead: "with a volatile DEFAULT
> (e.g., clock_timestamp()) or  changing the type of an existing column..."
>
> In my mind that actually might be a reason to keep it that way. I
> expect someone who's somewhat experienced to know there are things
> (like table rewrites and scans) you need to consider and therefore go
> to the ALTER TABLE page and read the details. But for someone newer
> the tutorial page needs to introduce them to the idea that those
> gotchas exist.
>
>
Readers of the DDL page are given a hint of the issues and directed to
additional, arguably mandatory, reading.  They can not worry about the
nuances during their learning phase but instead can defer that reading
until they actually have need to alter a (large) table.  But expecting them
to read the command reference page is reasonable and is IMO the more
probable place they will look when they start doing stuff in earnest.  For
the inexperienced reader breaking this up in this manner based upon depth
of detail feels right to me.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-22 Thread James Coleman
On Sat, Jan 22, 2022 at 12:35 AM David G. Johnston
 wrote:
>
> On Fri, Jan 21, 2022 at 5:14 PM James Coleman  wrote:
>>
>>
>> > Really?  That's horrid, because that's directly useful advice.
>>
>> Remedied, but rewritten a bit to better fit with the new style/goal of
>> that tip).
>>
>> Version 3 is attached.
>>
>
> Coming back to this after a respite I think the tip needs to be moved just 
> like everything else.  For much the same reason (though this may only be a 
> personal bias), I know what SQL Commands do the various things that DDL 
> encompasses (especially the basics like adding a column) and so the DDL 
> section is really just a tutorial-like chapter that I will generally forget 
> about because I will go straight to the official source which is the SQL 
> Command Reference.  My future self would want the tip to show up there.  If 
> we put the tip after the existing paragraph that starts: "Adding a column 
> with a volatile DEFAULT or changing the type of an existing column..." the 
> need to specify an example function in the tip goes away - though maybe it 
> should be moved to the notes paragraph instead: "with a volatile DEFAULT 
> (e.g., clock_timestamp()) or  changing the type of an existing column..."

In my mind that actually might be a reason to keep it that way. I
expect someone who's somewhat experienced to know there are things
(like table rewrites and scans) you need to consider and therefore go
to the ALTER TABLE page and read the details. But for someone newer
the tutorial page needs to introduce them to the idea that those
gotchas exist.

Thoughts?
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston
On Fri, Jan 21, 2022 at 5:14 PM James Coleman  wrote:

>
> > Really?  That's horrid, because that's directly useful advice.
>
> Remedied, but rewritten a bit to better fit with the new style/goal of
> that tip).
>
> Version 3 is attached.
>
>
Coming back to this after a respite I think the tip needs to be moved just
like everything else.  For much the same reason (though this may only be a
personal bias), I know what SQL Commands do the various things that DDL
encompasses (especially the basics like adding a column) and so the DDL
section is really just a tutorial-like chapter that I will generally forget
about because I will go straight to the official source which is the SQL
Command Reference.  My future self would want the tip to show up there.  If
we put the tip after the existing paragraph that starts: "Adding a column
with a volatile DEFAULT or changing the type of an existing column..." the
need to specify an example function in the tip goes away - though maybe it
should be moved to the notes paragraph instead: "with a volatile DEFAULT
(e.g., clock_timestamp()) or  changing the type of an existing column..."

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread James Coleman
On Fri, Jan 21, 2022 at 5:38 PM Tom Lane  wrote:
>
> "David G. Johnston"  writes:
> > You've removed the "constraint verification scan" portion of this.
>
> Indeed, because that's got nothing to do with adding a new column
> (per se; adding a constraint along with the column is a different
> can of worms).

Yeah. Initially I'd thought I'd wanted it there, but by explicitly
linking people to the ALTER TABLE docs for more details (I've made
that a link now too) I'm now inclined to agree that tightly focusing
the tip is better form.

> > Re-reading this, the recommendation:
>
> > - However, if the default value is volatile (e.g.,
> > - clock_timestamp())
> > - each row will need to be updated with the value calculated at the time
> > - ALTER TABLE is executed. To avoid a potentially
> > - lengthy update operation, particularly if you intend to fill the
> > column
> > - with mostly nondefault values anyway, it may be preferable to add the
> > - column with no default, insert the correct values using
> > - UPDATE, and then add any desired default as
> > described
> > - below.
>
> > has now been completely removed from the documentation.
>
> Really?  That's horrid, because that's directly useful advice.

Remedied, but rewritten a bit to better fit with the new style/goal of
that tip).

Version 3 is attached.

James Coleman
From ffca825ca27cffc70c7eb39385545a76fa0d9e2d Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Fri, 24 Sep 2021 09:59:27 -0400
Subject: [PATCH v3 1/2] Document atthasmissing default avoids verification
 table scan

When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
default value without rewriting the table the doc changes did not note
how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
Since adding a NOT NULL constraint requires a verification table scan to
ensure no values are null, users want to know that the combined
operation also avoids the table scan.
---
 doc/src/sgml/ref/alter_table.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index a76e2e7322..1dde16fa39 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1355,7 +1355,9 @@ WITH ( MODULUS numeric_literal, REM
 evaluated at the time of the statement and the result stored in the
 table's metadata.  That value will be used for the column for all existing
 rows.  If no DEFAULT is specified, NULL is used.  In
-neither case is a rewrite of the table required.
+neither case is a rewrite of the table required.  A NOT NULL
+constraint may be added to the new column in the same statement without
+requiring scanning the table to verify the constraint.

 

-- 
2.17.1

From 3f119b3f67f6452ff7594d4f19d60ca17a09e19f Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Fri, 21 Jan 2022 18:50:39 +
Subject: [PATCH v3 2/2] Don't double document ADD COLUMN optimization details

Instead point people to the source of truth. This also avoids using
different language ("constant" versus "non-volatile").
---
 doc/src/sgml/ddl.sgml | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 22f6c5c7ab..efd9542252 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1406,25 +1406,17 @@ ALTER TABLE products ADD COLUMN description text;
 

 
- From PostgreSQL 11, adding a column with
- a constant default value no longer means that each row of the table
- needs to be updated when the ALTER TABLE statement
- is executed. Instead, the default value will be returned the next time
- the row is accessed, and applied when the table is rewritten, making
- the ALTER TABLE very fast even on large tables.
-
+ Adding a new column can require rewriting the whole table,
+ making it slow for large tables.  However, the rewrite can be optimized
+ away in some cases, depending on what default value is given to the
+ column.  See  for details.
 
-
- However, if the default value is volatile (e.g.,
- clock_timestamp())
- each row will need to be updated with the value calculated at the time
- ALTER TABLE is executed. To avoid a potentially
- lengthy update operation, particularly if you intend to fill the column
- with mostly nondefault values anyway, it may be preferable to add the
+ When a rewrite is required (e.g., a volatile default value like
+ clock_timestamp()) it may be preferable to add the
  column with no default, insert the correct values using
  UPDATE, and then add any desired default as described
- below.
-
+ below.  This is particularly true if you intend to fill the column
+ with mostly nondefault values anyway.

 

-- 
2.17.1



Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread James Coleman
On Fri, Jan 21, 2022 at 4:08 PM Andrew Dunstan  wrote:
>
>
> On 1/21/22 13:55, James Coleman wrote:
> > On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
> >> As noted earlier I expect to be posting an updated patch soon.
> > Here's the updated series. In 0001 I've moved the documentation tweak
> > into the ALTER TABLE notes section. In 0002 I've taken David J's
> > suggestion of shortening the "Tip" on the DDL page and mostly using it
> > to point people to the Notes section on the ALTER TABLE page.
>
>
> I don't really like the first part of patch 1, but as it gets removed by
> patch 2 we can move past that.

At first I was very confused by this feedback, but after looking at
the patch files I sent, that's my fault: I meant to remove the
modification of the "Tip" section but somehow missed that in what I
sent. I'll correct that in the next patch series.

James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread Tom Lane
"David G. Johnston"  writes:
> You've removed the "constraint verification scan" portion of this.

Indeed, because that's got nothing to do with adding a new column
(per se; adding a constraint along with the column is a different
can of worms).

> Re-reading this, the recommendation:

> - However, if the default value is volatile (e.g.,
> - clock_timestamp())
> - each row will need to be updated with the value calculated at the time
> - ALTER TABLE is executed. To avoid a potentially
> - lengthy update operation, particularly if you intend to fill the
> column
> - with mostly nondefault values anyway, it may be preferable to add the
> - column with no default, insert the correct values using
> - UPDATE, and then add any desired default as
> described
> - below.

> has now been completely removed from the documentation.

Really?  That's horrid, because that's directly useful advice.

regards, tom lane




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston
On Fri, Jan 21, 2022 at 2:50 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan 
> wrote:
> >> I know what it's replacing refers to release 11, but let's stop doing
> >> that. How about something like this?
> >>
> >> Adding a new column can sometimes require rewriting the table,
> >> making it a very slow operation. However in many cases this rewrite
> >> and related verification scans can be optimized away by using an
> >> appropriate default value. See the notes in ALTER
> >> TABLE for details.
>
> > I think it is a virtue, and am supported in that feeling by the existing
> > wording, to be explicit about the release before which these
> optimizations
> > can not happen.  The docs generally use this to good effect without
> > overdoing it.  This is a prime example.
>
> The fact of the matter is that optimizations of this sort have existed
> for years.  (For example, I think we've optimized away the rewrite
> when the new column is DEFAULT NULL since the very beginning.)  So it
> does not help to write the text as if there were no such optimizations
> before version N and they were all there in N.
>

Fair point, and indeed the v10 docs do mention the NULL (or no default)
optimization.


> I agree that Andrew's text could stand a pass of "omit needless words".
> But I also think that we could be a bit more explicit about what "slow"
> means.  Maybe like
>
> Adding a new column can require rewriting the whole table,
> making it slow for large tables.  However the rewrite can be optimized
> away in some cases, depending on what default value is given to the
> column.  See ALTER TABLE for details.
>
>
Comma needed after however.
You've removed the "constraint verification scan" portion of this. Maybe:
"""
...
column.  The same applies for the NOT NULL constraint verification scan.
See ALTER TABLE for details.
"""


Re-reading this, the recommendation:

- However, if the default value is volatile (e.g.,
- clock_timestamp())
- each row will need to be updated with the value calculated at the time
- ALTER TABLE is executed. To avoid a potentially
- lengthy update operation, particularly if you intend to fill the
column
- with mostly nondefault values anyway, it may be preferable to add the
- column with no default, insert the correct values using
- UPDATE, and then add any desired default as
described
- below.

has now been completely removed from the documentation.  I suggest having
this remain as the Tip and turning the optimization stuff into a Note.


> (the ALTER TABLE reference should be a link, too)
>

Yeah, the page does have a link already (fairly close by...) but with these
changes putting one here seems to make sense.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread Tom Lane
"David G. Johnston"  writes:
> On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan  wrote:
>> I know what it's replacing refers to release 11, but let's stop doing
>> that. How about something like this?
>> 
>> Adding a new column can sometimes require rewriting the table,
>> making it a very slow operation. However in many cases this rewrite
>> and related verification scans can be optimized away by using an
>> appropriate default value. See the notes in ALTER
>> TABLE for details.

> I think it is a virtue, and am supported in that feeling by the existing
> wording, to be explicit about the release before which these optimizations
> can not happen.  The docs generally use this to good effect without
> overdoing it.  This is a prime example.

The fact of the matter is that optimizations of this sort have existed
for years.  (For example, I think we've optimized away the rewrite
when the new column is DEFAULT NULL since the very beginning.)  So it
does not help to write the text as if there were no such optimizations
before version N and they were all there in N.

I agree that Andrew's text could stand a pass of "omit needless words".
But I also think that we could be a bit more explicit about what "slow"
means.  Maybe like

Adding a new column can require rewriting the whole table,
making it slow for large tables.  However the rewrite can be optimized
away in some cases, depending on what default value is given to the
column.  See ALTER TABLE for details.

(the ALTER TABLE reference should be a link, too)

regards, tom lane




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston
On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan  wrote:

> On 1/21/22 13:55, James Coleman wrote:
>
> + Before PostgreSQL 11, adding a new
> column to a
> + table required rewriting that table, making it a very slow operation.
> + More recent versions can sometimes optimize away this rewrite and
> related
> + validation scans.  See the notes in ALTER TABLE
> for details.
>
>
> I know what it's replacing refers to release 11, but let's stop doing
> that. How about something like this?
>
> Adding a new column can sometimes require rewriting the table,
> making it a very slow operation. However in many cases this rewrite
> and related verification scans can be optimized away by using an
> appropriate default value. See the notes in ALTER
> TABLE for details.
>

I think it is a virtue, and am supported in that feeling by the existing
wording, to be explicit about the release before which these optimizations
can not happen.  The docs generally use this to good effect without
overdoing it.  This is a prime example.

The combined effect of "sometimes", "in many", "can be", and "an
appropriate" make this version harder to read than it probably needs to
be.  I like the patch as-is over this; but I would want to give an
alternative wording more thought if it is insisted upon that mention of
PostgreSQL 11 goes away.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread Andrew Dunstan


On 1/21/22 13:55, James Coleman wrote:
> On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
>> As noted earlier I expect to be posting an updated patch soon.
> Here's the updated series. In 0001 I've moved the documentation tweak
> into the ALTER TABLE notes section. In 0002 I've taken David J's
> suggestion of shortening the "Tip" on the DDL page and mostly using it
> to point people to the Notes section on the ALTER TABLE page.


I don't really like the first part of patch 1, but as it gets removed by
patch 2 we can move past that.


+ Before PostgreSQL 11, adding a new
column to a
+ table required rewriting that table, making it a very slow operation.
+ More recent versions can sometimes optimize away this rewrite and
related
+ validation scans.  See the notes in ALTER TABLE
for details.


I know what it's replacing refers to release 11, but let's stop doing
that. How about something like this?

Adding a new column can sometimes require rewriting the table,
making it a very slow operation. However in many cases this rewrite
and related verification scans can be optimized away by using an
appropriate default value. See the notes in ALTER
TABLE for details.

cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston
On Fri, Jan 21, 2022 at 11:55 AM James Coleman  wrote:

> On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
> >
> > As noted earlier I expect to be posting an updated patch soon.
>
> Here's the updated series. In 0001 I've moved the documentation tweak
> into the ALTER TABLE notes section. In 0002 I've taken David J's
> suggestion of shortening the "Tip" on the DDL page and mostly using it
> to point people to the Notes section on the ALTER TABLE page.
>
>
WFM

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread James Coleman
On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
>
> As noted earlier I expect to be posting an updated patch soon.

Here's the updated series. In 0001 I've moved the documentation tweak
into the ALTER TABLE notes section. In 0002 I've taken David J's
suggestion of shortening the "Tip" on the DDL page and mostly using it
to point people to the Notes section on the ALTER TABLE page.

Thanks,
James Coleman


v2-0001-Document-atthasmissing-default-avoids-verificatio.patch
Description: Binary data


v2-0002-Don-t-double-document-ADD-COLUMN-optimization-det.patch
Description: Binary data


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-20 Thread James Coleman
On Thu, Jan 20, 2022 at 3:31 PM Andrew Dunstan  wrote:
>
>
> On 1/20/22 12:25, Bossart, Nathan wrote:
> > On 1/19/22, 5:15 PM, "James Coleman"  wrote:
> >> I'm open to the idea of wordsmithing here, of course, but I strongly
> >> disagree that this is irrelevant data. There are plenty of
> >> optimizations Postgres could theoretically implement but doesn't, so
> >> measuring what should happen by what you think is obvious ("told it to
> >> populate with default values - why do you need to check") is clearly
> >> not valid.
> >>
> >> This patch actually came out of our specifically needing to verify
> >> that this is true before an op precisely because docs aren't actually
> >> clear and because we can't risk a large table scan under an exclusive
> >> lock. We're clearly not the only ones with that question; it came up
> >> in a comment on this blog post announcing the newly committed feature
> >> [1].
> > My initial reaction was similar to David's.  It seems silly to
> > document that we don't do something that seems obviously unnecessary.
> > However, I think you make a convincing argument for including it.  I
> > agree with David's feedback on where this information should go.
> >
>
> I still don't understand the confusion. When you add a new column with a
> non-null non-volatile default, none of the existing rows has any storage
> for the new column, so there is nothing to scan and nothing to verify on
> such rows. Only the catalog is changed. This is true whether or not the
> new column is constrained by NOT NULL. I don't understand what people
> think might have had to be verified by scanning the table.
>
> If what's happening is not clear from the docs then by all means let's
> make it clear. But in general I don't think we should talk about what we
> used to do.

This path isn't about talking about what we used to do (though that's
already in the docs); it is about trying to make it clear.

But actually "When you add a new column with a non-null non-volatile
default...there is nothing to scan" doesn't always hold as I showed
with the check constraint above. Other than that I think that phrasing
is actually almost close to the kind of clarity I'd like to see in the
docs.

As noted earlier I expect to be posting an updated patch soon.

Thanks,
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-20 Thread Andrew Dunstan


On 1/20/22 12:25, Bossart, Nathan wrote:
> On 1/19/22, 5:15 PM, "James Coleman"  wrote:
>> I'm open to the idea of wordsmithing here, of course, but I strongly
>> disagree that this is irrelevant data. There are plenty of
>> optimizations Postgres could theoretically implement but doesn't, so
>> measuring what should happen by what you think is obvious ("told it to
>> populate with default values - why do you need to check") is clearly
>> not valid.
>>
>> This patch actually came out of our specifically needing to verify
>> that this is true before an op precisely because docs aren't actually
>> clear and because we can't risk a large table scan under an exclusive
>> lock. We're clearly not the only ones with that question; it came up
>> in a comment on this blog post announcing the newly committed feature
>> [1].
> My initial reaction was similar to David's.  It seems silly to
> document that we don't do something that seems obviously unnecessary.
> However, I think you make a convincing argument for including it.  I
> agree with David's feedback on where this information should go.
>

I still don't understand the confusion. When you add a new column with a
non-null non-volatile default, none of the existing rows has any storage
for the new column, so there is nothing to scan and nothing to verify on
such rows. Only the catalog is changed. This is true whether or not the
new column is constrained by NOT NULL. I don't understand what people
think might have had to be verified by scanning the table.

If what's happening is not clear from the docs then by all means let's
make it clear. But in general I don't think we should talk about what we
used to do.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Document atthasmissing default optimization avoids verification table scan

2022-01-20 Thread Bossart, Nathan
On 1/19/22, 5:15 PM, "James Coleman"  wrote:
> I'm open to the idea of wordsmithing here, of course, but I strongly
> disagree that this is irrelevant data. There are plenty of
> optimizations Postgres could theoretically implement but doesn't, so
> measuring what should happen by what you think is obvious ("told it to
> populate with default values - why do you need to check") is clearly
> not valid.
>
> This patch actually came out of our specifically needing to verify
> that this is true before an op precisely because docs aren't actually
> clear and because we can't risk a large table scan under an exclusive
> lock. We're clearly not the only ones with that question; it came up
> in a comment on this blog post announcing the newly committed feature
> [1].

My initial reaction was similar to David's.  It seems silly to
document that we don't do something that seems obviously unnecessary.
However, I think you make a convincing argument for including it.  I
agree with David's feedback on where this information should go.

Nathan



Re: Document atthasmissing default optimization avoids verification table scan

2022-01-20 Thread James Coleman
On Wed, Jan 19, 2022 at 9:34 PM David G. Johnston
 wrote:
>
> On Wed, Jan 19, 2022 at 6:14 PM James Coleman  wrote:
>>
>> I'm open to the idea of wordsmithing here, of course, but I strongly
>> disagree that this is irrelevant data.
>
>
> Ok, but wording aside, only changing a tip in the DDL - Add Table section 
> doesn't seem like a complete fix.  The notes in alter table, where I'd look 
> for such an official directive first, need to be touched as well.
>
> For the alter table docs maybe change/add to the existing sentence below (I'm 
> in favor of not pointing out that scanning the table doesn't mean we are 
> rewriting it, but maybe I'm making another unwarranted assumption regarding 
> obviousness...).
>
> "Adding a CHECK or NOT NULL constraint requires scanning the table [but not 
> rewriting it] to verify that existing rows meet the constraint.  It is 
> skipped when done as part of ADD COLUMN unless a table rewrite is required 
> anyway."

I'm looking over the docs again to see how it might be better
structured; point is well taken that we should have it clearly in the
primary place.

> On that note, does the check constraint interplay with the default rewrite 
> avoidance in the same way?

I hadn't checked until you asked, but interestingly, no it doesn't (I
assume you mean scan not rewrite in this context):

test=# select seq_scan from pg_stat_all_tables where relname = 't2';
 seq_scan
--
2
test=# alter table t2 add column i3 int not null default 5;
ALTER TABLE
test=# select seq_scan from pg_stat_all_tables where relname = 't2';
 seq_scan
--
2
test=# alter table t2 add column i4 int default 5 check (i4 < 50);
ALTER TABLE
test=# select seq_scan from pg_stat_all_tables where relname = 't2';
 seq_scan
--
3

That seems like an opportunity for improvement here, though it's
obviously a separate patch. I might poke around at that though
later...

> For the Tip I'd almost rather redo it to say:
>
> "Before PostgreSQL 11, adding a new column to a table required rewriting that 
> table, making it a very slow operation.  More recent versions can sometimes 
> optimize away this rewrite and related validation scans.  See the notes in 
> ALTER TABLE for details."
>
> Though I suppose I'd accept something like (leave existing text, alternative 
> patch text):
>
> "[...]large tables.\nIf the added column also has a not null constraint the 
> usual verification scan is also skipped."
>
> "constant" is used in the Tip, "non-volatile" is used in alter table - hence 
> a desire to have just one source of truth, with alter table being the correct 
> place.  We should sync them up otherwise.

As noted I'll look over how restructuring might improve and reply with
an updated proposed patch.

Thanks,
James Coleman




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread David G. Johnston
On Wed, Jan 19, 2022 at 6:14 PM James Coleman  wrote:

> I'm open to the idea of wordsmithing here, of course, but I strongly
> disagree that this is irrelevant data.


Ok, but wording aside, only changing a tip in the DDL - Add Table section
doesn't seem like a complete fix.  The notes in alter table, where I'd look
for such an official directive first, need to be touched as well.

For the alter table docs maybe change/add to the existing sentence below
(I'm in favor of not pointing out that scanning the table doesn't mean we
are rewriting it, but maybe I'm making another unwarranted assumption
regarding obviousness...).

"Adding a CHECK or NOT NULL constraint requires scanning the table [but not
rewriting it] to verify that existing rows meet the constraint.  It is
skipped when done as part of ADD COLUMN unless a table rewrite is required
anyway."

On that note, does the check constraint interplay with the default rewrite
avoidance in the same way?

For the Tip I'd almost rather redo it to say:

"Before PostgreSQL 11, adding a new column to a table required rewriting
that table, making it a very slow operation.  More recent versions can
sometimes optimize away this rewrite and related validation scans.  See the
notes in ALTER TABLE for details."

Though I suppose I'd accept something like (leave existing text,
alternative patch text):

"[...]large tables.\nIf the added column also has a not null constraint the
usual verification scan is also skipped."

"constant" is used in the Tip, "non-volatile" is used in alter table -
hence a desire to have just one source of truth, with alter table being the
correct place.  We should sync them up otherwise.

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread James Coleman
On Wed, Jan 19, 2022 at 7:51 PM David G. Johnston
 wrote:
>
> On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan  wrote:
>>
>> On 9/24/21, 7:30 AM, "James Coleman"  wrote:
>> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
>> > default value without rewriting the table the doc changes did not note
>> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
>> > Previously such a new column required a verification table scan to
>> > ensure no values were null. That scan happens under an exclusive lock on
>> > the table, so it can have a meaningful impact on database "accessible
>> > uptime".
>>
>> I'm likely misunderstanding, but are you saying that adding a new
>> column with a default value and a NOT NULL constraint used to require
>> a verification scan?
>
>
> As a side-effect of rewriting every live record in the table and indexes to 
> brand new files, yes.  I doubt an actual independent scan was performed since 
> the only way for the newly written tuples to not have the default value 
> inserted would be a severe server bug.

I've confirmed it wasn't a separate scan, but it does evaluate all
constraints (it doesn't have any optimizations for skipping ones
probably true by virtue of the new default).

>>
>> + Additionally adding a column with a constant default value avoids a
>> + a table scan to verify no NULL values are present.
>>
>> Should this clarify that it's referring to NOT NULL constraints?
>>
>
> This doesn't seem like relevant material to comment on.  It's an 
> implementation detail that is sufficiently covered by "making the ALTER TABLE 
> very fast even on large tables".
>
> Also, the idea of performing that scan seems ludicrous.  I just added the 
> column and told it to populate with default values - why do you need to check 
> that your server didn't miss any?

I'm open to the idea of wordsmithing here, of course, but I strongly
disagree that this is irrelevant data. There are plenty of
optimizations Postgres could theoretically implement but doesn't, so
measuring what should happen by what you think is obvious ("told it to
populate with default values - why do you need to check") is clearly
not valid.

This patch actually came out of our specifically needing to verify
that this is true before an op precisely because docs aren't actually
clear and because we can't risk a large table scan under an exclusive
lock. We're clearly not the only ones with that question; it came up
in a comment on this blog post announcing the newly committed feature
[1].

I realize that most users aren't as worried about this kind of
specific detail about DDL as we are (requiring absolutely zero slow
DDL while under an exclusive lock), but it is relevant to high uptime
systems.

Thanks,
James Coleman

1: 
https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/




Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread David G. Johnston
On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan  wrote:

> On 9/24/21, 7:30 AM, "James Coleman"  wrote:
> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
> > default value without rewriting the table the doc changes did not note
> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
> > Previously such a new column required a verification table scan to
> > ensure no values were null. That scan happens under an exclusive lock on
> > the table, so it can have a meaningful impact on database "accessible
> > uptime".
>
> I'm likely misunderstanding, but are you saying that adding a new
> column with a default value and a NOT NULL constraint used to require
> a verification scan?
>

As a side-effect of rewriting every live record in the table and indexes to
brand new files, yes.  I doubt an actual independent scan was performed
since the only way for the newly written tuples to not have the default
value inserted would be a severe server bug.


> + Additionally adding a column with a constant default value avoids a
> + a table scan to verify no NULL values are present.
>
> Should this clarify that it's referring to NOT NULL constraints?
>
>
This doesn't seem like relevant material to comment on.  It's an
implementation detail that is sufficiently covered by "making the ALTER
TABLE very fast even on large tables".

Also, the idea of performing that scan seems ludicrous.  I just added the
column and told it to populate with default values - why do you need to
check that your server didn't miss any?

David J.


Re: Document atthasmissing default optimization avoids verification table scan

2022-01-19 Thread Bossart, Nathan
On 9/24/21, 7:30 AM, "James Coleman"  wrote:
> When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
> default value without rewriting the table the doc changes did not note
> how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
> Previously such a new column required a verification table scan to
> ensure no values were null. That scan happens under an exclusive lock on
> the table, so it can have a meaningful impact on database "accessible
> uptime".

I'm likely misunderstanding, but are you saying that adding a new
column with a default value and a NOT NULL constraint used to require
a verification scan?

+ Additionally adding a column with a constant default value avoids a
+ a table scan to verify no NULL values are present.

Should this clarify that it's referring to NOT NULL constraints?

Nathan