subject:"\[HACKERS\] Fast AT ADD COLUMN with DEFAULTs"

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-15 Thread Serge Rielau

This feature was added in DB2 year ago. AFAIK it was not very successful. 
Regular compression techniques proved serve a broader and purpose and save 
more space.
http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0056482.html 
[http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0056482.html]
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=pi&cv=9.1.19&pv=10.0.2&source=email_footer_2]

On Sat, Oct 15, 2016 at 09:10, Jim Nasby  wrote:
On 10/9/16 11:02 PM, Corey Huinker wrote:
>
> There's actually another use case here that's potentially extremely
> valuable for warehousing and other "big data": compact
> representation of a default value.
>
>
> I too would benefit from tables having either a default value in the
> event of a NOT-NULL column being flagged null, or a flat-out constant.
>
> This would be a big win in partitioned tables where the partition can
> only hold one value of the partitioning column.

I hadn't thought of that use case... with rowcounts in the billions
becoming pretty common even the cost of a 4 byte enum starts to add up.

> I guess a constant would be a pg_type where the sole value is encoded,
> and the column itself is stored like an empty string.

Not empty string; the storage would look like NULL does today; the
difference being that we'd know that attribute wasn't NULL-able so if
it's marked as being "NULL" it actually means it has the default value.
Though obviously this would only work if the default was a Const, and
you wouldn't be able to change the default without ensuring no rows in
the table were using this trick. But I suspect there's still plenty of
scenarios where the advantage is worth it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-15 Thread Jim Nasby


On 10/9/16 11:02 PM, Corey Huinker wrote:


There's actually another use case here that's potentially extremely
valuable for warehousing and other "big data": compact
representation of a default value.


I too would benefit from tables having either a default value in the
event of a NOT-NULL column being flagged null, or a flat-out constant.

This would be a big win in partitioned tables where the partition can
only hold one value of the partitioning column.


I hadn't thought of that use case... with rowcounts in the billions 
becoming pretty common even the cost of a 4 byte enum starts to add up.



I guess a constant would be a pg_type where the sole value is encoded,
and the column itself is stored like an empty string.


Not empty string; the storage would look like NULL does today; the 
difference being that we'd know that attribute wasn't NULL-able so if 
it's marked as being "NULL" it actually means it has the default value. 
Though obviously this would only work if the default was a Const, and 
you wouldn't be able to change the default without ensuring no rows in 
the table were using this trick. But I suspect there's still plenty of 
scenarios where the advantage is worth it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-09 Thread Corey Huinker

>
>
> There's actually another use case here that's potentially extremely
> valuable for warehousing and other "big data": compact representation of a
> default value.
>
>
I too would benefit from tables having either a default value in the event
of a NOT-NULL column being flagged null, or a flat-out constant.

This would be a big win in partitioned tables where the partition can only
hold one value of the partitioning column.

I guess a constant would be a pg_type where the sole value is encoded, and
the column itself is stored like an empty string.

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-09 Thread Jim Nasby


On 10/6/16 11:01 AM, Tom Lane wrote:

Something based on missing_value/absent_value could work for me too.
If we name it something involving "default", that definitely increases
the possibility for confusion with the regular user-settable default.

Also worth thinking about here is that the regular default expression
affects what will be put into future inserted rows, whereas this thing
affects the interpretation of past rows.  So it's really quite a different
animal.  That's kind of leading me away from calling it creation_default.


There's actually another use case here that's potentially extremely 
valuable for warehousing and other "big data": compact representation of 
a default value.


The idea here is that if you have a specific value for a field that 
makes up a very large portion of your data, you'd really like to be able 
to represent that value *in each row* with something like a bit (such as 
we currently do for NULLs).


What I'd expect to see in the real world (once users figure this hack 
out) would be:


CREATE TABLE ...(
  ...
  -- skip field_a so we can handle all it's common values
);

INSERT INTO
  ...
  SELECT
...
WHERE field_a IS NOT DISTINCT FROM 'really common value'
;

ALTER TABLE
  ADD field_a ... NOT NULL DEFAULT 'really common value'
;

-- load rest of the data

That would have the effect of storing all those really common values 
with a single bit.


What we'd ultimately want is some kind of catalog versioning so that we 
knew what was in place when each tuple was created; that would allow for 
changing these things over time without forcing a full rewrite.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Vitaly Burovoy

On 10/6/16, Serge Rielau  wrote:
>> On Oct 6, 2016, at 9:20 AM, Tom Lane  wrote:
>> Vitaly Burovoy  writes:
>>> But what I discover for myself is that we have pg_attrdef separately
>>> from the pg_attribute. Why?
>>
>> The core reason for that is that the default expression needs to be
>> a separate object from the column for purposes of dependency analysis.
>> For example, if you have a column whose default is "foo()", then the
>> default expression depends on the function foo(), but the column should
>> not: if you drop the function, only the default expression ought to
>> be dropped, not the column.
>>
>> Because of this, the default expression needs to have its own OID
>> (to be stored in pg_depend) and it's convenient to store it in a
>> separate catalog so that the classoid can identify it as being a
>> default expression rather than some other kind of object.
>
> Good to know.
>
>> If we were going to allow these missing_values or creation_defaults
>> or whatever they're called to be general expressions, then they would
>> need
>> to have their own OIDs for dependency purposes.  That would lead me to
>> think that the best representation is to put them in their own rows in
>> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
>> regular defaults from these things.  Or maybe they even need their own
>> catalog, depending on whether you think dependency analysis would want
>> to distinguish them from regular defaults using just the classed.
>>
>> Now, as I just pointed out in another mail, realistically we're probably
>> going to restrict the feature to simple constants, which'd mean they will
>> depend only on the column's type and can never need any dependencies of
>> their own.  So we could take the shortcut of just storing them in a new
>> column in pg_attribute.

I agree with you.

>> But maybe that's shortsighted and we'll
>> eventually wish we'd done them as full-fledged separate objects.

I don't think so. If we try to implement non-blocking adding columns
with volatile defaults (and for instance update old rows in the
background), we can end up with the next situation:

CREATE TABLE a(i bigint PRIMARY KEY);
INSERT INTO a SELECT generate_series(1,100);

ALTER TABLE a ADD COLUMN b bigserial CHECK (b BETWEEN 1 AND 100);

For indexes (even unique) created concurrently similar troubles are
solved with a "not valid" mark, but what to do with a heap if we try
to do it in the background?

>> But on the third hand ... once one of these is in place, how could you
>> drop it separately from the column?  That would amount to a change in the
>> column's stored data, which is not what one would expect from dropping
>> a separate object.  So maybe it's senseless to think that these things
>> could ever be distinct objects.  But that definitely leads to the
>> conclusion that they're constants and nothing else.

> I cannot follow this reasoning.
> Let’s look past what PG does today:
> For each row (whether that’s necessary or not) we evaluate the expression,
> compute the value and
> store it in the rewritten table.
> We do not record dependencies on the “pedigree” of the value.
> It happened to originate from the DEFAULT expression provided with the ADD
> COLUMN,
> but that is not remembered anywhere.
> All we remember is the value - in each row.
> So the only change that is proposed here - when it comes right down to it -
> is to remember the value once only (IFF it is provably the same for each row)
> and thus avoid the need to rewrite the table.
> So I see no reason to impose any restriction other than “evaluated value is
> provably the same for every row”.

Tom says the same thing. The expression at the end should be a value
if it allows to avoid rewriting table.

> Regarding the location of storage.
> I did start of using pg_attrdef, but ran into some snags.
> My approach was to add the value as an extra column (rather than an extra
> row).
> That caused trouble since a SET DEFAULT operation is decomposed into a DROP
> and a SET and
> preserving the value across such operations did not come naturally.

I'm sorry for making you be confused. The best way is to use an extra
column in the pg_attribute to store serialized value.

> If we were to use extra rows instead that issue would be solved, assuming we
> add a “default kind” sort of column.
> It would dictate the storage format though which may be considered overkill
> for a a constant.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Serge Rielau


> On Oct 6, 2016, at 9:20 AM, Tom Lane  wrote:
> 
> Vitaly Burovoy  writes:
>> But what I discover for myself is that we have pg_attrdef separately
>> from the pg_attribute. Why?
> 
> The core reason for that is that the default expression needs to be
> a separate object from the column for purposes of dependency analysis.
> For example, if you have a column whose default is "foo()", then the
> default expression depends on the function foo(), but the column should
> not: if you drop the function, only the default expression ought to
> be dropped, not the column.
> 
> Because of this, the default expression needs to have its own OID
> (to be stored in pg_depend) and it's convenient to store it in a
> separate catalog so that the classoid can identify it as being a
> default expression rather than some other kind of object.
Good to know.
> 
> If we were going to allow these missing_values or creation_defaults
> or whatever they're called to be general expressions, then they would need
> to have their own OIDs for dependency purposes.  That would lead me to
> think that the best representation is to put them in their own rows in
> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
> regular defaults from these things.  Or maybe they even need their own
> catalog, depending on whether you think dependency analysis would want
> to distinguish them from regular defaults using just the classed.
> 
> Now, as I just pointed out in another mail, realistically we're probably
> going to restrict the feature to simple constants, which'd mean they will
> depend only on the column's type and can never need any dependencies of
> their own.  So we could take the shortcut of just storing them in a new
> column in pg_attribute.  But maybe that's shortsighted and we'll
> eventually wish we'd done them as full-fledged separate objects.
> 
> But on the third hand ... once one of these is in place, how could you
> drop it separately from the column?  That would amount to a change in the
> column's stored data, which is not what one would expect from dropping
> a separate object.  So maybe it's senseless to think that these things
> could ever be distinct objects.  But that definitely leads to the
> conclusion that they're constants and nothing else.
I cannot follow this reasoning.
Let’s look past what PG does today:
For each row (whether that’s necessary or not) we evaluate the expression, 
compute the value and 
store it in the rewritten table.
We do not record dependencies on the “pedigree” of the value.
It happened to originate from the DEFAULT expression provided with the ADD 
COLUMN,
but that is not remembered anywhere.
All we remember is the value - in each row.

So the only change that is proposed here - when it comes right down to it - is 
to
remember the value once only (IFF it is provably the same for each row) and 
thus 
avoid the need to rewrite the table.
So I see no reason to impose any restriction other than “evaluated value is 
provably the same for every row”.

Regarding the location of storage.
I did start of using pg_attrdef, but ran into some snags.
My approach was to add the value as an extra column (rather than an extra row).
That caused trouble since a SET DEFAULT operation is decomposed into a DROP and 
a SET and
preserving the value across such operations did not come naturally.

If we were to use extra rows instead that issue would be solved, assuming we ad 
a “default kind” sort of column.
It would dictate the storage format though which may be considered overkill for 
a a constant. 

Cheers
Serge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Vitaly Burovoy

On 10/6/16, Vitaly Burovoy  wrote:
> Ough. I made a mistake about pg_attribute because I forgot about the
> pg_attrdef.
> If we do not merge these tables, the pg_attrdef is the best place to
> store evaluated expression as a constant the same way defaults are
> stored in adbin.

Oops. While I was writing the previous email, Tom explained necessity
of the pg_attrdef.
With that explanation it is obvious that I was wrong and a value for
pre-existing rows should be in a new column in the pg_attribute.
All the other thoughts from my previous email stand good.

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Vitaly Burovoy

On 10/6/16, Serge Rielau  wrote:
>> On Oct 6, 2016, at 9:01 AM, Tom Lane  wrote:
>> BTW, it also occurs to me that there are going to be good implementation
>> reasons for restricting it to be a hard constant, not any sort of
>> expression.  We are likely to need to be able to insert the value in
>> low-level code where general expression evaluation is impractical.
>>
> Yes, the padding must happen primarily in the getAttr() routines.
> Clearly we do not want to evaluate an expression there.
> But what speaks against evaluating the expression before we store it?
> After all we seem to all agree that this only works if the expression
> computes to the same constant all the time.
>
> If we do not want to store an “untyped” datum straight in pg_attribute as a
> BYTEA (my current approach)

Ough. I made a mistake about pg_attribute because I forgot about the pg_attrdef.
If we do not merge these tables, the pg_attrdef is the best place to
store evaluated expression as a constant the same way defaults are
stored in adbin.

> we could store the pretty printed version of the constant

It is a wrong way. It ruins commands like "ALTER COLUMN ... TYPE ... USING"

> and evaluate that when we build the tuple descriptor.
> This happens when we load the relation into the relcache.
>
> Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for
> itself once I have the WIP patch ready so we have something concrete to
> discuss

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Tom Lane

Vitaly Burovoy  writes:
> But what I discover for myself is that we have pg_attrdef separately
> from the pg_attribute. Why?

The core reason for that is that the default expression needs to be
a separate object from the column for purposes of dependency analysis.
For example, if you have a column whose default is "foo()", then the
default expression depends on the function foo(), but the column should
not: if you drop the function, only the default expression ought to
be dropped, not the column.

Because of this, the default expression needs to have its own OID
(to be stored in pg_depend) and it's convenient to store it in a
separate catalog so that the classoid can identify it as being a
default expression rather than some other kind of object.

If we were going to allow these missing_values or creation_defaults
or whatever they're called to be general expressions, then they would need
to have their own OIDs for dependency purposes.  That would lead me to
think that the best representation is to put them in their own rows in
pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
regular defaults from these things.  Or maybe they even need their own
catalog, depending on whether you think dependency analysis would want
to distinguish them from regular defaults using just the classoid.

Now, as I just pointed out in another mail, realistically we're probably
going to restrict the feature to simple constants, which'd mean they will
depend only on the column's type and can never need any dependencies of
their own.  So we could take the shortcut of just storing them in a new
column in pg_attribute.  But maybe that's shortsighted and we'll
eventually wish we'd done them as full-fledged separate objects.

But on the third hand ... once one of these is in place, how could you
drop it separately from the column?  That would amount to a change in the
column's stored data, which is not what one would expect from dropping
a separate object.  So maybe it's senseless to think that these things
could ever be distinct objects.  But that definitely leads to the
conclusion that they're constants and nothing else.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Vitaly Burovoy

On 10/6/16, Tom Lane  wrote:
> Serge Rielau  writes:
>>> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy 
>>> wrote:
 Which makes me think we should call this missing_value or absent_value

Be honest Simon Rigg's wrote that words.

 so its clear that it is not a "default" it is the value we use for
 rows that do not have any value stored for them.
>
>> I like Tom’s “creation default”. Another one could be “initial default”.
>> But that, too, can be misread.
>
> Something based on missing_value/absent_value could work for me too.
>
> If we name it something involving "default", that definitely increases
> the possibility for confusion with the regular user-settable default.
>
> Also worth thinking about here is that the regular default expression
> affects what will be put into future inserted rows, whereas this thing
> affects the interpretation of past rows.  So it's really quite a different
> animal.  That's kind of leading me away from calling it creation_default.
>
> BTW, it also occurs to me that there are going to be good implementation
> reasons for restricting it to be a hard constant, not any sort of
> expression.  We are likely to need to be able to insert the value in
> low-level code where general expression evaluation is impractical.

Yes, I mentioned that it should be evaluated and stored as a value
because user functions can be changed (besides the speed reason),
that's why I like the "value" in its name. The "default" is usually
identified with expressions, not values (which are particular cases of
expressions).

Serge mentioned the phrase "pre-existing rows", which makes me think
about something like "pre_existing_value"

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Serge Rielau


> On Oct 6, 2016, at 9:01 AM, Tom Lane  wrote:
> 
> BTW, it also occurs to me that there are going to be good implementation
> reasons for restricting it to be a hard constant, not any sort of
> expression.  We are likely to need to be able to insert the value in
> low-level code where general expression evaluation is impractical.
> 
Yes, the padding must happen primarily in the getAttr() routines.
Clearly we do not want to evaluate an expression there.
But what speaks against evaluating the expression before we store it?
After all we seem to all agree that this only works if the expression computes 
to the same constant all the time.

If we do not want to store an “untyped” datum straight in pg_attribute as a 
BYTEA (my current approach) we could store the pretty printed version of the 
constant
and evaluate that when we build the tuple descriptor.
This happens when we load the relation into the relcache.

Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for 
itself once I have the WIP patch ready so we have something concrete to discuss

Cheers
Serge




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Tom Lane

Serge Rielau  writes:
>> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy  wrote:
>>> Which makes me think we should call this missing_value or absent_value
>>> so its clear that it is not a "default" it is the value we use for
>>> rows that do not have any value stored for them.

> I like Tomâs âcreation defaultâ. Another one could be âinitial 
> defaultâ.
> But that, too, can be misread.

Something based on missing_value/absent_value could work for me too.
If we name it something involving "default", that definitely increases
the possibility for confusion with the regular user-settable default.

Also worth thinking about here is that the regular default expression
affects what will be put into future inserted rows, whereas this thing
affects the interpretation of past rows.  So it's really quite a different
animal.  That's kind of leading me away from calling it creation_default.

BTW, it also occurs to me that there are going to be good implementation
reasons for restricting it to be a hard constant, not any sort of
expression.  We are likely to need to be able to insert the value in
low-level code where general expression evaluation is impractical.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Serge Rielau


> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy  wrote:
> 
> On 10/6/16, Simon Riggs  wrote:
>> On 6 October 2016 at 04:43, Serge Rielau  wrote:
> Or should I compose some sort of a design document?
>> 
>> Having read this thread, I'm a little unclear as to what you're
>> writing now, though there's definitely good ideas here.
>> 
>> I think it would be beneficial to write up a single coherent
>> description of this, including behaviour and a small sketch of
>> implementation, just so everyone knows what this is. No design doc,
>> but a summary.
> 
> At the moment I think it can also be a good idea to post the current
> patch as a Proposal or a WIP to get initial feedback.
I can do that -  Accompanied by a posting sized overview.

> 
> Yes, it works for stable "now()" but does not work for volatile
> functions like "random()", "uuid_generate_v4()" or default for serial
> columns. The only possible way I can see is to check an expression has
> only "T_Const"s, static and stable functions. In such case the
> expression can be evaluated and the result be saved as a value for
> absented attributes of a tuple. In the other case save NULL there and
> rewrite the table.
Agreed. I think DEFAULT as-is does the job nicely function wise.
One can always decompose the ADD COLUMN into two steps within the same 
transaction 
if the initial column value for pre-existing rows does not match the default 
for new or updated rows.
 
AT Just needs a performance boost for large tables where that’s reasonably 
possible.

>> Which makes me think we should call this missing_value or absent_value
>> so its clear that it is not a "default" it is the value we use for
>> rows that do not have any value stored for them.
> 
> It is definitely a default for a user, it is not a regular default internally.
> I'm not a native speaker, "absent_value" can be mixed up with a NULL.
> As for me the best phrase is "pre-add-column-default", but it is
> impossible to use it as a column name. :-(
> It is still an open question.
I like Tom’s “creation default”. Another one could be “initial default”.
But that, too, can be misread.

Cheers
Serge Rielau
Salesforce.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Vitaly Burovoy

On 10/6/16, Simon Riggs  wrote:
> On 6 October 2016 at 04:43, Serge Rielau  wrote:
 Or should I compose some sort of a design document?
>
> Having read this thread, I'm a little unclear as to what you're
> writing now, though there's definitely good ideas here.
>
> I think it would be beneficial to write up a single coherent
> description of this, including behaviour and a small sketch of
> implementation, just so everyone knows what this is. No design doc,
> but a summary.

At the moment I think it can also be a good idea to post the current
patch as a Proposal or a WIP to get initial feedback.

> It would be very useful to be able to do this...
> ALTER TABLE foo ADD last_updated_timestamp timestamp default
> current_timestamp
> so that it generates a constant value and stores that for all prior
> rows, but then generates a new value for future rows.

Yes, it works for stable "now()" but does not work for volatile
functions like "random()", "uuid_generate_v4()" or default for serial
columns. The only possible way I can see is to check an expression has
only "T_Const"s, static and stable functions. In such case the
expression can be evaluated and the result be saved as a value for
absented attributes of a tuple. In the other case save NULL there and
rewrite the table.

> Which makes me think we should call this missing_value or absent_value
> so its clear that it is not a "default" it is the value we use for
> rows that do not have any value stored for them.

It is definitely a default for a user, it is not a regular default internally.
I'm not a native speaker, "absent_value" can be mixed up with a NULL.
As for me the best phrase is "pre-add-column-default", but it is
impossible to use it as a column name. :-(
It is still an open question.

(I remember funny versions in a discussion[1] when people tried to
choose a name for a function reversed to pg_size_pretty...)

[1] 
https://www.postgresql.org/message-id/flat/cafj8prd-tgodknxdygecza4on01_urqprwf-8ldkse-6bdh...@mail.gmail.com

--
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-06 Thread Simon Riggs

On 6 October 2016 at 04:43, Serge Rielau  wrote:

>>> Or should I compose some sort of a design document?

Having read this thread, I'm a little unclear as to what you're
writing now, though there's definitely good ideas here.

I think it would be beneficial to write up a single coherent
description of this, including behaviour and a small sketch of
implementation, just so everyone knows what this is. No design doc,
but a summary.


It would be very useful to be able to do this...
ALTER TABLE foo ADD last_updated_timestamp timestamp default current_timestamp
so that it generates a constant value and stores that for all prior
rows, but then generates a new value for future rows.

Which makes me think we should call this missing_value or absent_value
so its clear that it is not a "default" it is the value we use for
rows that do not have any value stored for them.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau


> On Oct 5, 2016, at 5:52 PM, Vitaly Burovoy  wrote:
> 
> On 10/5/16, Serge Rielau  wrote:
>> I want to point out as a minor "extension" that there is no need for the
>> default to be immutable. It is merely required that the default is evaluate
>> at time of ADD COLUMN and then we remember the actual value for the exist
>> default, rather than the parsed expression as we do for the "current"
>> default.
> 
> I don't think it will be accepted.
And I wouldn’t expect it to. I had a misunderstanding on what PG did.
Clearly the enhancement must be semantically neutral and be limited to the 
cases where that can be asserted.
So my patch will detect that situation and fall back to the original behavior 
as needed.
> Your patch must be based on a just "master" branch.
> If you haven't seen wiki pages [1], [2] and [3], it is the time to do
> it (and related ones).
> 
>> Or should I compose some sort of a design document?
> 
> Since Tom Lane, Andres Freund and other people agreed "it does work",
> you may post a patch to a new thread and write a short (but clean
> enough) description with a link to the current thread. Examples can be
> seen by links from the CF[4].
Thanks of rte guidance.
It will take a bit of time to port to community code and complete QA.
I shall return….
 
Serge Rielau
Salesforce.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Serge Rielau  wrote:
>On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy  
>wrote:
>> But what I discover for myself is that we have pg_attrdef separately
>> from the pg_attribute. Why?
>> Is it time to join them? For not presented defaults it would be only
>> one bit per row(if we avoid "adsrc" as it is recommended), but for a
>> separate table it is 11 columns with two indexes now...
>
> In terms of footprint we may be able to remove pg_attrdef.
> I would consider that orthogonal to the proposed feature though.

It was a question mostly to the community rather than to you.

> The internal representation of defaults in the tuple descriptor still needs 
> to be a map of sorts.
>
> To comment on Pantelis SQL Server Reference:
> Other vendors such as Oracle and DB2 also support this feature.
>
> The listed restriction made me loop back to Vitaly’s original serial example:
> ALTER TABLE t ADD COLUMN c2 serial;
> and rethink Tom’s struct restriction to constants.

I'm sorry, the correct example with "now" should be:
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT
'now'::text::timestamptz;

I wanted to show that for non-immutable (even stable) expression as a
default one each time selected old tuples gives different result.
But your implementation evaluates it before saving as a
"pre-add-column" default and it breakes the current behavior in a
different way.

> In PG the proposed feature would also have to be limited to immutable(?)
> default expressions to comply with existing behavior, which matches SQL 
> Servers.
>
> My current patch does not restrict that and thusly falsely "fills in" the 
> same value for all rows.

If you change the current behavior (just making it "faster") you
should save the current behavior.
The best case is to determine whether it is possible to do a "fast"
change and fill the "pre-add-column" default parameter or leave the
current "long" behavior.

I assure it is often enough to add columns with a default value which
fills the current rows by non-static values.
One of them is adding a serial column (which is a macros for "CREATE
SEQUENCE+SET NULL+SET DEFAULT
nextval(just_created_sequence_relation)"), others are
"uuid_generate_vX(...)" and "random()"

On 10/5/16, Serge Rielau  wrote:
> I want to point out as a minor "extension" that there is no need for the
> default to be immutable. It is merely required that the default is evaluate
> at time of ADD COLUMN and then we remember the actual value for the exist
> default, rather than the parsed expression as we do for the "current"
> default.

I don't think it will be accepted.

On 10/5/16, Serge Rielau  wrote:
> Thanks.
> This would be my first contribution.
> I take it I would post a patch based on a recent PG 9.6 master for review?

Your patch must be based on a just "master" branch.
If you haven't seen wiki pages [1], [2] and [3], it is the time to do
it (and related ones).

> Or should I compose some sort of a design document?

Since Tom Lane, Andres Freund and other people agreed "it does work",
you may post a patch to a new thread and write a short (but clean
enough) description with a link to the current thread. Examples can be
seen by links from the CF[4].
Nevertheless it is important to honor all thoughts mentioned here
because if your patch breaks the current behavior (with no significant
reason) it is senseless (even if it matches the behavior of other
RDBMS) and will not be committed.

Of cource, feel free to ask.

[1] https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
[2] https://wiki.postgresql.org/wiki/Developer_FAQ
[3] https://wiki.postgresql.org/wiki/Submitting_a_Patch
[4] https://commitfest.postgresql.org/11/

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2]
On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy  wrote:
On 10/5/16, Tom Lane  wrote:
> I wrote:
>> Need a better name for the concept, since evidently this name isn't
>> conveying the idea.
>
> Maybe "creation default" would work better? Point being it's the
> default value at the time of column creation.
Hmm... Personaly for me the original topic name is good enough. I think at 
issue is with the term “exist default” rather than the feature/topic name (?) 
But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now... In terms of footprint 
we may be able to remove pg_attrdef. I would consider that orthogonal to the 
proposed feature though. The internal representation of defaults in the tuple 
descriptor still needs to be a map of sorts.
To comment on Pantelis SQL Server Reference: Other vendors such as Oracle and 
DB2 also support this feature.
The listed restriction made me loop back to Vitaly’s original serial example: 
ALTER TABLE t ADD COLUMN c2 serial; and rethink Tom’s struct restriction to 
constants.
In PG the proposed feature would also have to be limited to immutable(?) 
default expressions to comply with existing behavior, which matches SQL Servers.
My current patch does not restrict that and thusly falsely "fills in" the same 
value for all rows.
Cheers Serge Rielau Salesforce.com

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Tom Lane  wrote:
> I wrote:
>> Need a better name for the concept, since evidently this name isn't
>> conveying the idea.
>
> Maybe "creation default" would work better?  Point being it's the
> default value at the time of column creation.

Hmm... Personaly for me the original topic name is good enough.


But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now...
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Pantelis Theodosiou

On Thu, Oct 6, 2016 at 12:05 AM, Serge Rielau  wrote:

>
> via Newton Mail
> 
>
> On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional
> >>> column in a system table (pg_attribute) as default column values of
> >>> the "pre-alter" era. It solves changing of the default expression of
> >>> the same column later.
>
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
>
> I think it does work, as long as the "exists default" is immutable.
> (For safety, personally, I'd restrict it to be a verbatim constant.)
> The point is that you apply that when you are reading a row that has
> so few columns that it must predate the original ALTER TABLE ADD COLUMN.
> Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
> insertions that happen after them, but they don't affect the
> interpretation of old rows. And of course all rows inserted after the
> ADD COLUMN contain explicit values of the column, so their meaning is
> unaffected in any case.
>
>
> You do need two defaults associated with a column to make this work.
> The "exists default" never changes after the column is added. But
> in principle, the "exists default" just replaces the NULL value that
> we implicitly insert now in such cases.
>
> Explained so much better than I could do it :-)
>
> I want to point out as a minor “extension” that there is no need for the
> default to be immutable. It is merely required that the default is evaluate
> at time of ADD COLUMN
> and then we remember the actual value for the exist default, rather than
> the parsed expression as we do for the “current” default.
>
>
> Need a better name for the concept, since evidently this name isn't
> conveying the idea.
>
> By all means. Got anything in mind?
>
>
>
For comparison, SQL Server's implementation. They have a similar feature
(in their Enterprise only edition).
>From https://msdn.microsoft.com/en-us/library/ms190273.aspx :

Adding NOT NULL Columns as an Online Operation

Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column
with a default value is an online operation when the default value is
a *runtime
constant*. This means that the operation is completed almost
instantaneously regardless of the number of rows in the table. This is
because the existing rows in the table are not updated during the
operation; instead, the default value is stored only in the metadata of the
table and the value is looked up as needed in queries that access these
rows. This behavior is automatic; no additional syntax is required to
implement the online operation beyond the ADD COLUMN syntax. A runtime
constant is an expression that produces the same value at runtime for each
row in the table regardless of its determinism. For example, the constant
expression "My temporary data", or the system function GETUTCDATETIME() are
runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID()
are not runtime constants because a unique value is produced for each row
in the table. Adding a NOT NULL column with a default value that is not a
runtime constant is always performed offline and an exclusive (SCH-M) lock
is acquired for the duration of the operation.

While the existing rows reference the value stored in metadata, the default
value is stored on the row for any new rows that are inserted and do not
specify another value for the column. The default value stored in metadata
is moved to an existing row when the row is updated (even if the actual
column is not specified in the UPDATE statement), or if the table or
clustered index is rebuilt.

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2]
On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane  wrote:
Andres Freund  writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.
You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases. Explained so much better than I could 
do it :-)
I want to point out as a minor “extension” that there is no need for the 
default to be immutable. It is merely required that the default is evaluate at 
time of ADD COLUMN and then we remember the actual value for the exist default, 
rather than the parsed expression as we do for the “current” default. Need a 
better name for the concept, since evidently this name isn't
conveying the idea. By all means. Got anything in mind?
Cheers Serge Rielau

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Tom Lane

I wrote:
> Need a better name for the concept, since evidently this name isn't
> conveying the idea.

Maybe "creation default" would work better?  Point being it's the
default value at the time of column creation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Andres Freund

On 2016-10-05 18:58:47 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional
> >>> column in a system table (pg_attribute) as default column values of
> >>> the "pre-alter" era. It solves changing of the default expression of
> >>> the same column later.
> 
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
> 
> I think it does work, as long as the "exists default" is immutable.
> (For safety, personally, I'd restrict it to be a verbatim constant.)
> The point is that you apply that when you are reading a row that has
> so few columns that it must predate the original ALTER TABLE ADD COLUMN.
> Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
> insertions that happen after them, but they don't affect the
> interpretation of old rows.  And of course all rows inserted after the
> ADD COLUMN contain explicit values of the column, so their meaning is
> unaffected in any case.

Err, yes. I forgot that altering the default of an existing column
doesn't set the default for existing values. Sorry for the noise.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 1;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 2;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 3;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>> >
>> > The result here would be that there's three rows with a default value
>> > for foo that's the same as their id. None of them has that column
>> > present in the row.
>> >
>>
>> My understanding is that all of those would be materialized.
>
> But that'd require a table rewrite, as none of the above INSERTs were
> done when a default was in place.

Since they did not have the default value, that tuples are written
with actual TupleDesc.natts where att_isnull for "withdefault" column
is set (actually the column does not have default for inserted tuples
in your case).

> But each has a different "applicable" default value.

No, their values are constructed "from scratch", not fetched from a
heap, so "pre-alter-add-column" default is not applicable for them.

>> The only
>> default that isn't materialized is the one in effect in the same
>> statement
>> in which that column was added.  Since a column can only be added once,
>> the
>> default in effect at the time the column was added can never change, no
>> matter what you do to the default later on.
>
> DROP DEFAULT pretty much does that, because it allows multiple (set of)
> rows with no value (or a NULL) for a specific column, but with differing
> applicable default values.

DROP DEFAULT is for "post-alter-add-column" tuples, it does not
affects "pre-alter-add-column" ones.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Pantelis Theodosiou

On Wed, Oct 5, 2016 at 11:44 PM, Jeff Janes  wrote:

> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
>
>> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>> > On 10/5/16, Andres Freund  wrote:
>> > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> > >> Dear Hackers,
>> > >> I’m working on a patch that expands PG’s ability to add columns to a
>> table
>> > >> without a table rewrite (i.e. at O(1) cost) from the
>> > >> nullable-without-default to a more general case.
>> > >
>> > > If I understand this proposal correctly, altering a column default
>> will
>> > > still have trigger a rewrite unless there's previous default?
>> >
>> > No, "a second “exist default"" was mentioned, i.e. it is an additional
>> > column in a system table (pg_attribute) as default column values of
>> > the "pre-alter" era. It solves changing of the default expression of
>> > the same column later.
>>
>> Don't think that actually solves the issue. The default might be unset
>> for a while, for example. Essentially you'd need to be able to associate
>> arbitrary number of default values with an arbitrary set of rows.
>>
>>
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 1;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 2;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 3;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>>
>> The result here would be that there's three rows with a default value
>> for foo that's the same as their id. None of them has that column
>> present in the row.
>>
>
> My understanding is that all of those would be materialized.  The only
> default that isn't materialized is the one in effect in the same statement
> in which that column was added.  Since a column can only be added once, the
> default in effect at the time the column was added can never change, no
> matter what you do to the default later on.
>
> Cheers,
>
> Jeff
>

I understood the same thing. That when the column is added the "DEFAULT
constant" means 2 things:

-a- existing rows get a value of that constant (that is not actually
written in the rows, but kept (hidden from the user) in the system tables
and only written in the rows that are updated, vacuumed, etc) and
-b- new rows, inserted after the ADD COLUMN will get the DEFAULT constant,
same way as a normal column definition would.

The b part can easily be changed later with an ALTER COLUMN that sets a new
DEFAULT.
The a part is never changed - but possibly deleted from the system table
when all rows existing before the ADD COLUMN have been updated.


Pantelis

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Tom Lane

Andres Freund  writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second âexist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows.  And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added.  But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.

Need a better name for the concept, since evidently this name isn't
conveying the idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Andres Freund

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
> 
> > On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> > > On 10/5/16, Andres Freund  wrote:
> > > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> > > >> Dear Hackers,
> > > >> I’m working on a patch that expands PG’s ability to add columns to a
> > table
> > > >> without a table rewrite (i.e. at O(1) cost) from the
> > > >> nullable-without-default to a more general case.
> > > >
> > > > If I understand this proposal correctly, altering a column default will
> > > > still have trigger a rewrite unless there's previous default?
> > >
> > > No, "a second “exist default"" was mentioned, i.e. it is an additional
> > > column in a system table (pg_attribute) as default column values of
> > > the "pre-alter" era. It solves changing of the default expression of
> > > the same column later.
> >
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
> >
> >
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 1;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 2;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 3;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
> >
> > The result here would be that there's three rows with a default value
> > for foo that's the same as their id. None of them has that column
> > present in the row.
> >
> 
> My understanding is that all of those would be materialized.

But that'd require a table rewrite, as none of the above INSERTs were
done when a default was in place. But each has a different "applicable"
default value.


> The only
> default that isn't materialized is the one in effect in the same statement
> in which that column was added.  Since a column can only be added once, the
> default in effect at the time the column was added can never change, no
> matter what you do to the default later on.

DROP DEFAULT pretty much does that, because it allows multiple (set of)
rows with no value (or a NULL) for a specific column, but with differing
applicable default values.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Jeff Janes

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:

> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> > On 10/5/16, Andres Freund  wrote:
> > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> > >> Dear Hackers,
> > >> I’m working on a patch that expands PG’s ability to add columns to a
> table
> > >> without a table rewrite (i.e. at O(1) cost) from the
> > >> nullable-without-default to a more general case.
> > >
> > > If I understand this proposal correctly, altering a column default will
> > > still have trigger a rewrite unless there's previous default?
> >
> > No, "a second “exist default"" was mentioned, i.e. it is an additional
> > column in a system table (pg_attribute) as default column values of
> > the "pre-alter" era. It solves changing of the default expression of
> > the same column later.
>
> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.
>
>
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 1;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 2;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 3;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>
> The result here would be that there's three rows with a default value
> for foo that's the same as their id. None of them has that column
> present in the row.
>

My understanding is that all of those would be materialized.  The only
default that isn't materialized is the one in effect in the same statement
in which that column was added.  Since a column can only be added once, the
default in effect at the time the column was added can never change, no
matter what you do to the default later on.

Cheers,

Jeff

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Vitaly Burovoy  wrote:
> On 10/5/16, Andres Freund  wrote:
>> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>>> On 10/5/16, Andres Freund  wrote:
>>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>>> >> Dear Hackers,
>>> >> I’m working on a patch that expands PG’s ability to add columns to a
>>> >> table
>>> >> without a table rewrite (i.e. at O(1) cost) from the
>>> >> nullable-without-default to a more general case.
>>> >
>>> > If I understand this proposal correctly, altering a column default
>>> > will
>>> > still have trigger a rewrite unless there's previous default?
>>>
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.
>>
>> Don't think that actually solves the issue. The default might be unset
>> for a while, for example. Essentially you'd need to be able to associate
>> arbitrary number of default values with an arbitrary set of rows.
>>
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 1;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 2;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 3;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>>
>> The result here would be that there's three rows with a default value
>> for foo that's the same as their id. None of them has that column
>> present in the row.
>
> I'm sorry, while I was writting "pre-alter" I meant
> "pre-alter-add-column" era (not "pre-alter-set-default"), all later
> default changes "current" default, whereas "pre-alter-add-column" adds
> value if current column number < TupleDesc.natts.
>
> All your DDL are in the "post-alter-add-column" era.

I'm so sorry, I was in a hurry. Of course,
- if current column number < TupleDesc.natts.
+ if current column number > TupleDesc.natts.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>> On 10/5/16, Andres Freund  wrote:
>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> >> Dear Hackers,
>> >> I’m working on a patch that expands PG’s ability to add columns to a
>> >> table
>> >> without a table rewrite (i.e. at O(1) cost) from the
>> >> nullable-without-default to a more general case.
>> >
>> > If I understand this proposal correctly, altering a column default will
>> > still have trigger a rewrite unless there's previous default?
>>
>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>> column in a system table (pg_attribute) as default column values of
>> the "pre-alter" era. It solves changing of the default expression of
>> the same column later.
>
> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.
>
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 1;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 2;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 3;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>
> The result here would be that there's three rows with a default value
> for foo that's the same as their id. None of them has that column
> present in the row.

I'm sorry, while I was writting "pre-alter" I meant
"pre-alter-add-column" era (not "pre-alter-set-default"), all later
default changes "current" default, whereas "pre-alter-add-column" adds
value if current column number < TupleDesc.natts.

All your DDL are in the "post-alter-add-column" era.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

On Wed, Oct 5, 2016 at 3:23 PM, Vitaly Burovoy  wrote:
On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> Dear Hackers,
>> I’m working on a patch that expands PG’s ability to add columns to a table
>> without a table rewrite (i.e. at O(1) cost) from the
>> nullable-without-default to a more general case.
> If I understand this proposal correctly, altering a column default will
> still have trigger a rewrite unless there's previous default?
No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later. Correct and good guess on pg_attribute. That’s where 
it’s living in my proposal.
Cheers Serge

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Andres Freund

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund  wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.


ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> Dear Hackers,
>> I’m working on a patch that expands PG’s ability to add columns to a table
>> without a table rewrite (i.e. at O(1) cost) from the
>> nullable-without-default to a more general case.
>
> If I understand this proposal correctly, altering a column default will
> still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Andres Freund

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> Dear Hackers,
> I’m working on a patch that expands PG’s ability to add columns to a table 
> without a table rewrite (i.e. at O(1) cost) from the nullable-without-default 
> to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); 
> INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT 
> NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN 
> SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; 
> => (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
> Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
> The requirement is driven by large tables for which add column takes too much 
> time and/or produces too large a transaction for comfort.
> In simplified terms: * a second “exist default” is computed and stored in the 
> catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple 
> descriptor (e.g in attrdef) * When one of the getAttr or copytuple related 
> routines is invoked the exist default is filled in instead of simply NULL 
> padding if the tuple is shorter the requested attribute number.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy  wrote:
On 10/5/16, Serge Rielau  wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
...
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed. Yes, 
Yes, and still analyzing speed
But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk; By solution I think you mean a semantic change 
from what it is doing today which is: * “Now” is fixed to ALTER TABLE time for 
all pre-existing rows * serial will fill in the same value for all pre-existing 
rows Having different semantics for those would require a rewrite and probably 
different syntax in some form.
This is what my patch does on our PG derivative today: CREATE TABLE t (pk INT 
NOT NULL PRIMARY KEY); CREATE TABLE postgres=# INSERT INTO t VALUES (1), (2), 
(3); INSERT 0 3 postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL 
DEFAULT 'now'; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 
+--- 1 | 2016-10-05 21:47:58.919194+00 2 | 
2016-10-05 21:47:58.919194+00 3 | 2016-10-05 21:47:58.919194+00 (3 rows)
postgres=# postgres=# ALTER TABLE t ADD COLUMN c2 serial; SELECT * FROM t ORDER 
BY pk; INSERT INTO t(pk) VALUES (4); SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial; ALTER TABLE postgres=# SELECT * FROM t 
ORDER BY pk; pk | c1 | c2 +---+ 1 | 
2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 
2016-10-05 21:47:58.919194+00 | 1 (3 rows)
postgres=# INSERT INTO t(pk) VALUES (4); INSERT 0 1 postgres=# SELECT * FROM t 
ORDER BY pk; pk | c1 | c2 +---+ 1 | 
2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 
2016-10-05 21:47:58.919194+00 | 1 4 | 2016-10-05 21:47:58.919194+00 | 2 (4 
rows) P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases... Thanks. This would be my first 
contribution. I take it I would post a patch based on a recent PG 9.6 master 
for review? Or should I compose some sort of a design document?
Cheers Serge Rielau Salesforce.com

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Vitaly Burovoy

On 10/5/16, Serge Rielau  wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
>
> CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);
> INSERT INTO T VALEUS (1), (2), (3);
> ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;
> INSERT INTO T VALUES (4, DEFAULT);
> ALTER TABLE T ALTER COLUMN SET DEFAULT 6;
> INSERT INTO T VALUS (5, DEFAULT);
> SELECT * FROM T ORDER BY pk;
> =>
> (1, 5),
> (2, 5),
> (3, 5),
> (4, 5),
> (5, 6);
>
> Rows 1-3 have never been updated, yet they know that their values of c1 is
> 5.
>
> The requirement is driven by large tables for which add column takes too
> much time and/or produces too large a transaction for comfort.
>
> In simplified terms:
>
> * a second “exist default” is computed and stored in
> the catalogs at time of AT ADD COLUMN
>
> * The exist default is cached in the tuple descriptor (e.g in attrdef)
>
> * When one of the getAttr or copytuple related routines is invoked
> the exist default is filled in instead of simply NULL padding if the
> tuple is shorter the requested attribute number.
>
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed.

But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk;

P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases...

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Serge Rielau

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table 
without a table rewrite (i.e. at O(1) cost) from the nullable-without-default 
to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); 
INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT 
NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN 
SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; 
=> (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
The requirement is driven by large tables for which add column takes too much 
time and/or produces too large a transaction for comfort.
In simplified terms: * a second “exist default” is computed and stored in the 
catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple 
descriptor (e.g in attrdef) * When one of the getAttr or copytuple related 
routines is invoked the exist default is filled in instead of simply NULL 
padding if the tuple is shorter the requested attribute number.

Is there an interest in principle in the community for this functionality?
Cheers Serge Rielau Salesforce.com
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2]

37 matches

Mail list logo