On Wed, Nov 30, 2016 at 10:56 AM, Amit Langote <amitlangot...@gmail.com> wrote:
> The latest patch I posted earlier today has this implementation.

I decided to try out these patches today with #define
CLOBBER_CACHE_ALWAYS 1 in pg_config_manual.h, which found a couple of
problems:

1. RelationClearRelation() wasn't preserving the rd_partkey, even
though there's plenty of code that relies on it not changing while we
hold a lock on the relation - in particular, transformPartitionBound.

2. partition_bounds_equal() was using the comparator and collation for
partitioning column 0 to compare the datums for all partitioning
columns.  It's amazing this passed the regression tests.

The attached incremental patch fixes those things and some cosmetic
issues I found along the way.

3. RelationGetPartitionDispatchInfo() is badly broken:

1010         pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
1011         pd[i]->relid = RelationGetRelid(partrel);
1012         pd[i]->key = RelationGetPartitionKey(partrel);
1013         pd[i]->keystate = NIL;
1014         pd[i]->partdesc = partdesc;
1015         pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
1016         heap_close(partrel, NoLock);
1017
1018         m = 0;
1019         for (j = 0; j < partdesc->nparts; j++)
1020         {
1021             Oid     partrelid = partdesc->oids[j];

This code imagines that pointers it extracted from partrel are certain
to remain valid after heap_close(partrel, NoLock), perhaps on the
strength of the fact that we still retain a lock on the relation.  But
this isn't the case.  As soon as nobody has the relation open, a call
to RelationClearRelation() will destroy the relcache entry and
everything to which it points; with CLOBBER_CACHE_ALWAYS, I see a
failure at line 1021:

#0  RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
leaf_part_oids=0x7fff5633b938) at partition.c:1021
1021                Oid        partrelid = partdesc->oids[j];
(gdb) bt 5
#0  RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
leaf_part_oids=0x7fff5633b938) at partition.c:1021
#1  0x0000000109b8d71f in ExecInitModifyTable (node=0x7fd12984d750,
estate=0x7fd12b885438, eflags=0) at nodeModifyTable.c:1730
#2  0x0000000109b5e7ac in ExecInitNode (node=0x7fd12984d750,
estate=0x7fd12b885438, eflags=0) at execProcnode.c:159
#3  0x0000000109b58548 in InitPlan (queryDesc=0x7fd12b87b638,
eflags=0) at execMain.c:961
#4  0x0000000109b57dcd in standard_ExecutorStart
(queryDesc=0x7fd12b87b638, eflags=0) at execMain.c:239
(More stack frames follow...)
Current language:  auto; currently minimal
(gdb) p debug_query_string
$1 = 0x7fd12b84c238 "insert into list_parted values (null, 1);"
(gdb) p partdesc[0]
$2 = {
  nparts = 2139062143,
  oids = 0x7f7f7f7f7f7f7f7f,
  boundinfo = 0x7f7f7f7f7f7f7f7f
}

As you can see, the partdesc is no longer valid here.  I'm not
immediately sure how to fix this; this isn't a simple thinko.  You
need to keep the relations open for the whole duration of the query,
not just long enough to build the dispatch info.  I think you should
try to revise this so that each relation is opened once and kept open;
maybe the first loop should be making a pointer-list of Relations
rather than an int-list of relation OIDs.  And it seems to me (though
I'm getting a little fuzzy here because it's late) that you need all
of the partitions open, not just the ones that are subpartitioned,
because how else are you going to know how to remap the tuple if the
column order is different?  But I don't see this code doing that,
which makes me wonder if the partitions are being opened yet again in
some other location.

I recommend that once you fix this, you run 'make check' with #define
CLOBBER_CACHE_ALWAYS 1 and look for other hazards.  Such mistakes are
easy to make with this kind of patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 83dc151..cc9009d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -53,16 +53,16 @@
  * Information about bounds of a partitioned relation
  *
  * A list partition datum that is known to be NULL is never put into the
- * datums array, instead it is tracked using has_null and null_index fields.
+ * datums array. Instead, it is tracked using has_null and null_index fields.
  *
- * In case of range partitioning, ndatums is far less than 2 * nparts, because
- * a partition's upper bound and the next partition's lower bound are same
- * in most common cases, and we only store one of them.
+ * In the case of range partitioning, ndatums will typically be far less than
+ * 2 * nparts, because a partition's upper bound and the next partition's lower
+ * bound are the same in most common cases, and we only store one of them.
  *
- * In case of list partitioning, the indexes array stores one entry for every
- * datum, which is the index of the partition that accepts a given datum.
- * Wheareas, in case of range partitioning, it stores one entry per distinct
- * range datum, which is the index of the partition of which a given datum
+ * In the case of list partitioning, the indexes array stores one entry for
+ * every datum, which is the index of the partition that accepts a given datum.
+ * In case of range partitioning, it stores one entry per distinct range
+ * datum, which is the index of the partition for which a given datum
  * is an upper bound.
  */
 
@@ -135,16 +135,19 @@ typedef struct PartitionDispatchData
 	int					   *indexes;
 } PartitionDispatchData;
 
-static int32 qsort_partition_list_value_cmp(const void *a, const void *b, void *arg);
-static int32 qsort_partition_rbound_cmp(const void *a, const void *b, void *arg);
+static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
+							   void *arg);
+static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
+						   void *arg);
 
 static List *get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec);
 static List *get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec);
-static Oid get_partition_operator(PartitionKey key, int col, StrategyNumber strategy,
-					   bool *need_relabel);
+static Oid get_partition_operator(PartitionKey key, int col,
+					   StrategyNumber strategy, bool *need_relabel);
 static List *generate_partition_qual(Relation rel, bool recurse);
 
-static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index, List *datums, bool lower);
+static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
+					 List *datums, bool lower);
 static int32 partition_rbound_cmp(PartitionKey key,
 					 Datum *datums1, RangeDatumContent *content1, bool lower1,
 					 PartitionRangeBound *b2);
@@ -152,9 +155,11 @@ static int32 partition_rbound_datum_cmp(PartitionKey key,
 						   Datum *rb_datums, RangeDatumContent *rb_content,
 						   Datum *tuple_datums);
 
-static int32 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
+static int32 partition_bound_cmp(PartitionKey key,
+					PartitionBoundInfo boundinfo,
 					int offset, void *probe, bool probe_is_bound);
-static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
+static int partition_bound_bsearch(PartitionKey key,
+						PartitionBoundInfo boundinfo,
 						void *probe, bool probe_is_bound, bool *is_equal);
 
 /* Support get_partition_for_tuple() */
@@ -636,8 +641,8 @@ partition_bounds_equal(PartitionKey key,
 		{
 			int32	cmpval;
 
-			cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
-													 key->partcollation[0],
+			cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j],
+													 key->partcollation[j],
 													 b1->datums[i][j],
 													 b2->datums[i][j]));
 			if (cmpval != 0)
@@ -976,7 +981,7 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode,
 		PartitionDesc	partdesc = RelationGetPartitionDesc(partrel);
 
 		/*
-		 * If this partition is a partitined table, add its children to to the
+		 * If this partition is a partitioned table, add its children to the
 		 * end of the list, so that they are processed as well.
 		 */
 		if (partdesc)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index abfb46b..c77b216 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1990,9 +1990,9 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		pfree(newattno);
 
 		/*
-		 * Close the parent rel, but keep our ShareUpdateExclusiveLock on it
-		 * until xact commit.  That will prevent someone else from deleting or
-		 * ALTERing the parent before the child is committed.
+		 * Close the parent rel, but keep our lock on it until xact commit.
+		 * That will prevent someone else from deleting or ALTERing the parent
+		 * before the child is committed.
 		 */
 		heap_close(relation, NoLock);
 	}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index c958092..2da7ae3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1192,7 +1192,7 @@ equalPartitionDescs(PartitionKey key, PartitionDesc partdesc1,
 
 		/*
 		 * Now compare partition bound collections.  The logic to iterate over
-		 * the collections is local to partition.c.
+		 * the collections is private to partition.c.
 		 */
 		if (partdesc1->boundinfo != NULL)
 		{
@@ -2604,7 +2604,9 @@ RelationClearRelation(Relation relation, bool rebuild)
 		SWAPFIELD(Oid, rd_toastoid);
 		/* pgstat_info must be preserved */
 		SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
-
+		/* partition key must be preserved */
+		SWAPFIELD(PartitionKey, rd_partkey);
+		SWAPFIELD(MemoryContext, rd_partkeycxt);
 		/* preserve old partdesc if no logical change */
 		if (keep_partdesc)
 		{
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to