Hi Thomas,

On 2017/08/07 10:58, Thomas Munro wrote:
> Hi hackers,
> 
> If you drop a column from a partitioned table then it has a TupleDesc
> that matches existing partitions, but new partitions created after
> that have non-same TupleDescs (according to convert_tuples_by_name)
> because they don't have the dropped column.  That means that inserts
> to partitions created later need to go via the deform->remap->form
> code path in tupconvert.c.  If you're using a time-based partitioning
> scheme where you add a new partition for each month and mostly insert
> into the current month, as is very common, then after dropping a
> column you'll eventually finish up sending ALL your inserts through
> tupconvert.c for the rest of time.

That's good observation.

> For example, having hacked my tree to print out a message to tell me
> if it had to convert a tuple:
> 
> postgres=# create table parent (a int, b int) partition by list (b);
> CREATE TABLE
> postgres=# create table child1 partition of parent for values in (1);
> CREATE TABLE
> postgres=# alter table parent drop column a;
> ALTER TABLE
> postgres=# create table child2 partition of parent for values in (2);
> CREATE TABLE
> postgres=# insert into parent values (1);
> NOTICE:  no map
> INSERT 0 1
> postgres=# insert into parent values (2);
> NOTICE:  map!
> INSERT 0 1
> 
> Of course there are other usage patterns where you might prefer it
> this way, because you'll mostly be inserting into partitions created
> before the change.  In general, would it be better for the partitioned
> table's TupleDesc to match partitions created before or after a
> change?  Since partitioned tables have no storage themselves, is there
> any technical reason we couldn't remove a partitioned table's dropped
> pg_attribute so that its TupleDesc matches partitions created later?

That means the parent's TupleDesc will begin mismatching that of all of
the existing partitions and they will suddenly need a map where they
didn't before.

I guess you considered it, but optimizing for the common case of range
partitioning where most of the inserts go to the newest partition will
hurt the other partitioning methods, like hash, where that won't
necessarily be true.

> Is there some way that tupconvert.c could make this type of difference
> moot?

Do you mean the difference arising due to dropped columns in either the
partitioned table or the table attached as a partition?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to