On Wed, Nov 30, 2016 at 10:56 AM, Amit Langote <amitlangot...@gmail.com> wrote: > The latest patch I posted earlier today has this implementation.
I decided to try out these patches today with #define CLOBBER_CACHE_ALWAYS 1 in pg_config_manual.h, which found a couple of problems: 1. RelationClearRelation() wasn't preserving the rd_partkey, even though there's plenty of code that relies on it not changing while we hold a lock on the relation - in particular, transformPartitionBound. 2. partition_bounds_equal() was using the comparator and collation for partitioning column 0 to compare the datums for all partitioning columns. It's amazing this passed the regression tests. The attached incremental patch fixes those things and some cosmetic issues I found along the way. 3. RelationGetPartitionDispatchInfo() is badly broken: 1010 pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); 1011 pd[i]->relid = RelationGetRelid(partrel); 1012 pd[i]->key = RelationGetPartitionKey(partrel); 1013 pd[i]->keystate = NIL; 1014 pd[i]->partdesc = partdesc; 1015 pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int)); 1016 heap_close(partrel, NoLock); 1017 1018 m = 0; 1019 for (j = 0; j < partdesc->nparts; j++) 1020 { 1021 Oid partrelid = partdesc->oids[j]; This code imagines that pointers it extracted from partrel are certain to remain valid after heap_close(partrel, NoLock), perhaps on the strength of the fact that we still retain a lock on the relation. But this isn't the case. As soon as nobody has the relation open, a call to RelationClearRelation() will destroy the relcache entry and everything to which it points; with CLOBBER_CACHE_ALWAYS, I see a failure at line 1021: #0 RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3, leaf_part_oids=0x7fff5633b938) at partition.c:1021 1021 Oid partrelid = partdesc->oids[j]; (gdb) bt 5 #0 RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3, leaf_part_oids=0x7fff5633b938) at partition.c:1021 #1 0x0000000109b8d71f in ExecInitModifyTable (node=0x7fd12984d750, estate=0x7fd12b885438, eflags=0) at nodeModifyTable.c:1730 #2 0x0000000109b5e7ac in ExecInitNode (node=0x7fd12984d750, estate=0x7fd12b885438, eflags=0) at execProcnode.c:159 #3 0x0000000109b58548 in InitPlan (queryDesc=0x7fd12b87b638, eflags=0) at execMain.c:961 #4 0x0000000109b57dcd in standard_ExecutorStart (queryDesc=0x7fd12b87b638, eflags=0) at execMain.c:239 (More stack frames follow...) Current language: auto; currently minimal (gdb) p debug_query_string $1 = 0x7fd12b84c238 "insert into list_parted values (null, 1);" (gdb) p partdesc[0] $2 = { nparts = 2139062143, oids = 0x7f7f7f7f7f7f7f7f, boundinfo = 0x7f7f7f7f7f7f7f7f } As you can see, the partdesc is no longer valid here. I'm not immediately sure how to fix this; this isn't a simple thinko. You need to keep the relations open for the whole duration of the query, not just long enough to build the dispatch info. I think you should try to revise this so that each relation is opened once and kept open; maybe the first loop should be making a pointer-list of Relations rather than an int-list of relation OIDs. And it seems to me (though I'm getting a little fuzzy here because it's late) that you need all of the partitions open, not just the ones that are subpartitioned, because how else are you going to know how to remap the tuple if the column order is different? But I don't see this code doing that, which makes me wonder if the partitions are being opened yet again in some other location. I recommend that once you fix this, you run 'make check' with #define CLOBBER_CACHE_ALWAYS 1 and look for other hazards. Such mistakes are easy to make with this kind of patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 83dc151..cc9009d 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -53,16 +53,16 @@ * Information about bounds of a partitioned relation * * A list partition datum that is known to be NULL is never put into the - * datums array, instead it is tracked using has_null and null_index fields. + * datums array. Instead, it is tracked using has_null and null_index fields. * - * In case of range partitioning, ndatums is far less than 2 * nparts, because - * a partition's upper bound and the next partition's lower bound are same - * in most common cases, and we only store one of them. + * In the case of range partitioning, ndatums will typically be far less than + * 2 * nparts, because a partition's upper bound and the next partition's lower + * bound are the same in most common cases, and we only store one of them. * - * In case of list partitioning, the indexes array stores one entry for every - * datum, which is the index of the partition that accepts a given datum. - * Wheareas, in case of range partitioning, it stores one entry per distinct - * range datum, which is the index of the partition of which a given datum + * In the case of list partitioning, the indexes array stores one entry for + * every datum, which is the index of the partition that accepts a given datum. + * In case of range partitioning, it stores one entry per distinct range + * datum, which is the index of the partition for which a given datum * is an upper bound. */ @@ -135,16 +135,19 @@ typedef struct PartitionDispatchData int *indexes; } PartitionDispatchData; -static int32 qsort_partition_list_value_cmp(const void *a, const void *b, void *arg); -static int32 qsort_partition_rbound_cmp(const void *a, const void *b, void *arg); +static int32 qsort_partition_list_value_cmp(const void *a, const void *b, + void *arg); +static int32 qsort_partition_rbound_cmp(const void *a, const void *b, + void *arg); static List *get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec); static List *get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec); -static Oid get_partition_operator(PartitionKey key, int col, StrategyNumber strategy, - bool *need_relabel); +static Oid get_partition_operator(PartitionKey key, int col, + StrategyNumber strategy, bool *need_relabel); static List *generate_partition_qual(Relation rel, bool recurse); -static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index, List *datums, bool lower); +static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index, + List *datums, bool lower); static int32 partition_rbound_cmp(PartitionKey key, Datum *datums1, RangeDatumContent *content1, bool lower1, PartitionRangeBound *b2); @@ -152,9 +155,11 @@ static int32 partition_rbound_datum_cmp(PartitionKey key, Datum *rb_datums, RangeDatumContent *rb_content, Datum *tuple_datums); -static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, +static int32 partition_bound_cmp(PartitionKey key, + PartitionBoundInfo boundinfo, int offset, void *probe, bool probe_is_bound); -static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, +static int partition_bound_bsearch(PartitionKey key, + PartitionBoundInfo boundinfo, void *probe, bool probe_is_bound, bool *is_equal); /* Support get_partition_for_tuple() */ @@ -636,8 +641,8 @@ partition_bounds_equal(PartitionKey key, { int32 cmpval; - cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], - key->partcollation[0], + cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j], + key->partcollation[j], b1->datums[i][j], b2->datums[i][j])); if (cmpval != 0) @@ -976,7 +981,7 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, PartitionDesc partdesc = RelationGetPartitionDesc(partrel); /* - * If this partition is a partitined table, add its children to to the + * If this partition is a partitioned table, add its children to the * end of the list, so that they are processed as well. */ if (partdesc) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index abfb46b..c77b216 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1990,9 +1990,9 @@ MergeAttributes(List *schema, List *supers, char relpersistence, pfree(newattno); /* - * Close the parent rel, but keep our ShareUpdateExclusiveLock on it - * until xact commit. That will prevent someone else from deleting or - * ALTERing the parent before the child is committed. + * Close the parent rel, but keep our lock on it until xact commit. + * That will prevent someone else from deleting or ALTERing the parent + * before the child is committed. */ heap_close(relation, NoLock); } diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index c958092..2da7ae3 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -1192,7 +1192,7 @@ equalPartitionDescs(PartitionKey key, PartitionDesc partdesc1, /* * Now compare partition bound collections. The logic to iterate over - * the collections is local to partition.c. + * the collections is private to partition.c. */ if (partdesc1->boundinfo != NULL) { @@ -2604,7 +2604,9 @@ RelationClearRelation(Relation relation, bool rebuild) SWAPFIELD(Oid, rd_toastoid); /* pgstat_info must be preserved */ SWAPFIELD(struct PgStat_TableStatus *, pgstat_info); - + /* partition key must be preserved */ + SWAPFIELD(PartitionKey, rd_partkey); + SWAPFIELD(MemoryContext, rd_partkeycxt); /* preserve old partdesc if no logical change */ if (keep_partdesc) {
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers