Thanks for the review. On 2017/08/16 2:27, Robert Haas wrote: > On Wed, Aug 9, 2017 at 10:11 PM, Amit Langote > <langote_amit...@lab.ntt.co.jp> wrote: >>> P.S. While I haven't reviewed 0002 in detail, I think the concept of >>> minimizing what needs to be built in RelationGetPartitionDispatchInfo >>> is a very good idea. >> >> I put this patch ahead in the list and so it's now 0001. > > I think what you've currently got as > 0003-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch is a > bug fix that probably needs to be back-patched into v10, so it should > come first.
That makes sense. That patch is now 0001. Checked that it can be back-patched to REL_10_STABLE. > I think 0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch and > 0005-Store-in-pg_inherits-if-a-child-is-a-partitioned-tab.patch should > be merged into one patch and that should come next, Merged the two into one: attached 0002. > followed by > 0004-Teach-expand_inherited_rtentry-to-use-partition-boun.patch and This one is now 0003. > finally what you now have as > 0001-Decouple-RelationGetPartitionDispatchInfo-from-execu.patch. And 0004. > This patch series is blocking a bunch of other things, so it would be > nice if you could press forward with this quickly. Attached updated patches. Thanks, Amit
From 23a3e291001394ffa2b79b34b32c582cb4898e87 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 16 Aug 2017 11:36:14 +0900 Subject: [PATCH 1/4] Relieve RelationGetPartitionDispatchInfo() of doing any locking Anyone who wants to call RelationGetPartitionDispatchInfo() must first acquire locks using find_all_inheritors. Doing it this way gets rid of the possibility of a deadlock when partitions are concurrently locked, because RelationGetPartitionDispatchInfo would lock the partitions in one order and find_all_inheritors would in another. Reported-by: Amit Khandekar, Robert Haas Reports: https://postgr.es/m/CAJ3gD9fdjk2O8aPMXidCeYeB-mFB%3DwY9ZLfe8cQOfG4bTqVGyQ%40mail.gmail.com https://postgr.es/m/CA%2BTgmobwbh12OJerqAGyPEjb_%2B2y7T0nqRKTcjed6L4NTET6Fg%40mail.gmail.com --- src/backend/catalog/partition.c | 55 ++++++++++++++++++++++------------------- src/backend/executor/execMain.c | 18 +++++++++----- src/include/catalog/partition.h | 3 +-- 3 files changed, 42 insertions(+), 34 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index c1a307c8d3..96a64ce6b2 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -999,12 +999,16 @@ get_partition_qual_relid(Oid relid) * RelationGetPartitionDispatchInfo * Returns information necessary to route tuples down a partition tree * - * All the partitions will be locked with lockmode, unless it is NoLock. - * A list of the OIDs of all the leaf partitions of rel is returned in - * *leaf_part_oids. + * The number of elements in the returned array (that is, the number of + * PartitionDispatch objects for the partitioned tables in the partition tree) + * is returned in *num_parted and a list of the OIDs of all the leaf + * partitions of rel is returned in *leaf_part_oids. + * + * All the relations in the partition tree (including 'rel') must have been + * locked (using at least the AccessShareLock) by the caller. */ PartitionDispatch * -RelationGetPartitionDispatchInfo(Relation rel, int lockmode, +RelationGetPartitionDispatchInfo(Relation rel, int *num_parted, List **leaf_part_oids) { PartitionDispatchData **pd; @@ -1019,14 +1023,18 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, offset; /* - * Lock partitions and make a list of the partitioned ones to prepare - * their PartitionDispatch objects below. + * We rely on the relcache to traverse the partition tree to build both + * the leaf partition OIDs list and the array of PartitionDispatch objects + * for the partitioned tables in the tree. That means every partitioned + * table in the tree must be locked, which is fine since we require the + * caller to lock all the partitions anyway. * - * Cannot use find_all_inheritors() here, because then the order of OIDs - * in parted_rels list would be unknown, which does not help, because we - * assign indexes within individual PartitionDispatch in an order that is - * predetermined (determined by the order of OIDs in individual partition - * descriptors). + * For every partitioned table in the tree, starting with the root + * partitioned table, add its relcache entry to parted_rels, while also + * queuing its partitions (in the order in which they appear in the + * partition descriptor) to be looked at later in the same loop. This is + * a bit tricky but works because the foreach() macro doesn't fetch the + * next list element until the bottom of the loop. */ *num_parted = 1; parted_rels = list_make1(rel); @@ -1035,29 +1043,24 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, APPEND_REL_PARTITION_OIDS(rel, all_parts, all_parents); forboth(lc1, all_parts, lc2, all_parents) { - Relation partrel = heap_open(lfirst_oid(lc1), lockmode); + Oid partrelid = lfirst_oid(lc1); Relation parent = lfirst(lc2); - PartitionDesc partdesc = RelationGetPartitionDesc(partrel); - /* - * If this partition is a partitioned table, add its children to the - * end of the list, so that they are processed as well. - */ - if (partdesc) + if (get_rel_relkind(partrelid) == RELKIND_PARTITIONED_TABLE) { + /* + * Already locked by the caller. Note that it is the + * responsibility of the caller to close the below relcache entry, + * once done using the information being collected here (for + * example, in ExecEndModifyTable). + */ + Relation partrel = heap_open(partrelid, NoLock); + (*num_parted)++; parted_rels = lappend(parted_rels, partrel); parted_rel_parents = lappend(parted_rel_parents, parent); APPEND_REL_PARTITION_OIDS(partrel, all_parts, all_parents); } - else - heap_close(partrel, NoLock); - - /* - * We keep the partitioned ones open until we're done using the - * information being collected here (for example, see - * ExecEndModifyTable). - */ } /* diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 6671a25ffb..eeadd8bec5 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -43,6 +43,7 @@ #include "access/xact.h" #include "catalog/namespace.h" #include "catalog/partition.h" +#include "catalog/pg_inherits_fn.h" #include "catalog/pg_publication.h" #include "commands/matview.h" #include "commands/trigger.h" @@ -3248,10 +3249,16 @@ ExecSetupPartitionTupleRouting(Relation rel, ListCell *cell; int i; ResultRelInfo *leaf_part_rri; + List *all_parts; - /* Get the tuple-routing information and lock partitions */ - *pd = RelationGetPartitionDispatchInfo(rel, RowExclusiveLock, num_parted, - &leaf_parts); + /* + * Get the information about the partition tree after locking all the + * partitions. + */ + all_parts = find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, + NULL); + list_free(all_parts); + *pd = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts); *num_partitions = list_length(leaf_parts); *partitions = (ResultRelInfo *) palloc(*num_partitions * sizeof(ResultRelInfo)); @@ -3274,9 +3281,8 @@ ExecSetupPartitionTupleRouting(Relation rel, TupleDesc part_tupdesc; /* - * We locked all the partitions above including the leaf partitions. - * Note that each of the relations in *partitions are eventually - * closed by the caller. + * All the partitions were locked above. Note that the relcache + * entries will be closed by ExecEndModifyTable(). */ partrel = heap_open(lfirst_oid(cell), NoLock); part_tupdesc = RelationGetDescr(partrel); diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index bef7a0f5fb..2283c675e9 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -88,8 +88,7 @@ extern Expr *get_partition_qual_relid(Oid relid); /* For tuple routing */ extern PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel, - int lockmode, int *num_parted, - List **leaf_part_oids); + int *num_parted, List **leaf_part_oids); extern void FormPartitionKeyDatum(PartitionDispatch pd, TupleTableSlot *slot, EState *estate, -- 2.11.0
From e0ffad29a97f8ab2c2ee9bff1a4c1c6168c08532 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 8 Aug 2017 18:42:30 +0900 Subject: [PATCH 2/4] Teach pg_inherits.c a bit about partitioning Both find_inheritance_children and find_all_inheritors now list partitioned child tables before non-partitioned ones and return the number of partitioned tables in an optional output argument We also now store in pg_inherits, when adding a new child, if the child is a partitioned table. Per design idea from Robert Haas --- contrib/sepgsql/dml.c | 2 +- doc/src/sgml/catalogs.sgml | 10 +++ src/backend/catalog/partition.c | 2 +- src/backend/catalog/pg_inherits.c | 157 ++++++++++++++++++++++++++------- src/backend/commands/analyze.c | 3 +- src/backend/commands/lockcmds.c | 2 +- src/backend/commands/publicationcmds.c | 2 +- src/backend/commands/tablecmds.c | 56 +++++++----- src/backend/commands/vacuum.c | 3 +- src/backend/executor/execMain.c | 2 +- src/backend/optimizer/prep/prepunion.c | 2 +- src/include/catalog/pg_inherits.h | 4 +- src/include/catalog/pg_inherits_fn.h | 5 +- 13 files changed, 187 insertions(+), 63 deletions(-) diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c index b643720e36..6fc279805c 100644 --- a/contrib/sepgsql/dml.c +++ b/contrib/sepgsql/dml.c @@ -333,7 +333,7 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation) if (!rte->inh) tableIds = list_make1_oid(rte->relid); else - tableIds = find_all_inheritors(rte->relid, NoLock, NULL); + tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL); foreach(li, tableIds) { diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index ef7054cf26..c1d5a75020 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3894,6 +3894,16 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt>< inherited columns are to be arranged. The count starts at 1. </entry> </row> + + <row> + <entry><structfield>inhchildparted</structfield></entry> + <entry><type>bool</type></entry> + <entry></entry> + <entry> + This is <literal>true</> if the child table is a partitioned table, + <literal>false</> otherwise + </entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 96a64ce6b2..efc025ec42 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -178,7 +178,7 @@ RelationBuildPartitionDesc(Relation rel) return; /* Get partition oids from pg_inherits */ - inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock); + inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL); /* Collect bound spec nodes in a list */ i = 0; diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 245a374fc9..0285bc3c33 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -33,6 +33,8 @@ #include "utils/syscache.h" #include "utils/tqual.h" +static int32 inhchildinfo_cmp(const void *p1, const void *p2); + /* * Entry of a hash table used in find_all_inheritors. See below. */ @@ -42,6 +44,30 @@ typedef struct SeenRelsEntry ListCell *numparents_cell; /* corresponding list cell */ } SeenRelsEntry; +/* Information about one inheritance child table. */ +typedef struct InhChildInfo +{ + Oid relid; + bool is_partitioned; +} InhChildInfo; + +#define OID_CMP(o1, o2) \ + ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0)); + +static int32 +inhchildinfo_cmp(const void *p1, const void *p2) +{ + InhChildInfo c1 = *((const InhChildInfo *) p1); + InhChildInfo c2 = *((const InhChildInfo *) p2); + + if (c1.is_partitioned && !c2.is_partitioned) + return -1; + if (!c1.is_partitioned && c2.is_partitioned) + return 1; + + return OID_CMP(c1.relid, c2.relid); +} + /* * find_inheritance_children * @@ -54,7 +80,8 @@ typedef struct SeenRelsEntry * against possible DROPs of child relations. */ List * -find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) +find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, + int *num_partitioned_children) { List *list = NIL; Relation relation; @@ -62,9 +89,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) ScanKeyData key[1]; HeapTuple inheritsTuple; Oid inhrelid; - Oid *oidarr; - int maxoids, - numoids, + InhChildInfo *inhchildren; + int maxchildren, + numchildren, + my_num_partitioned_children, i; /* @@ -77,9 +105,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) /* * Scan pg_inherits and build a working array of subclass OIDs. */ - maxoids = 32; - oidarr = (Oid *) palloc(maxoids * sizeof(Oid)); - numoids = 0; + maxchildren = 32; + inhchildren = (InhChildInfo *) palloc(maxchildren * sizeof(InhChildInfo)); + numchildren = 0; + my_num_partitioned_children = 0; relation = heap_open(InheritsRelationId, AccessShareLock); @@ -93,34 +122,49 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) while ((inheritsTuple = systable_getnext(scan)) != NULL) { + bool is_partitioned; + inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; - if (numoids >= maxoids) + is_partitioned = ((Form_pg_inherits) + GETSTRUCT(inheritsTuple))->inhchildparted; + + if (numchildren >= maxchildren) { - maxoids *= 2; - oidarr = (Oid *) repalloc(oidarr, maxoids * sizeof(Oid)); + maxchildren *= 2; + inhchildren = (InhChildInfo *) repalloc(inhchildren, + maxchildren * sizeof(InhChildInfo)); } - oidarr[numoids++] = inhrelid; + inhchildren[numchildren].relid = inhrelid; + inhchildren[numchildren].is_partitioned = is_partitioned; + + if (is_partitioned) + my_num_partitioned_children++; + numchildren++; } systable_endscan(scan); heap_close(relation, AccessShareLock); + if (num_partitioned_children) + *num_partitioned_children = my_num_partitioned_children; + /* * If we found more than one child, sort them by OID. This ensures * reasonably consistent behavior regardless of the vagaries of an * indexscan. This is important since we need to be sure all backends * lock children in the same order to avoid needless deadlocks. */ - if (numoids > 1) - qsort(oidarr, numoids, sizeof(Oid), oid_cmp); + if (numchildren > 1) + qsort(inhchildren, numchildren, sizeof(InhChildInfo), + inhchildinfo_cmp); /* * Acquire locks and build the result list. */ - for (i = 0; i < numoids; i++) + for (i = 0; i < numchildren; i++) { - inhrelid = oidarr[i]; + inhrelid = inhchildren[i].relid; if (lockmode != NoLock) { @@ -144,7 +188,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) list = lappend_oid(list, inhrelid); } - pfree(oidarr); + pfree(inhchildren); return list; } @@ -159,18 +203,28 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) * given rel. * * The specified lock type is acquired on all child relations (but not on the - * given rel; caller should already have locked it). If lockmode is NoLock - * then no locks are acquired, but caller must beware of race conditions - * against possible DROPs of child relations. + * given rel; caller should already have locked it), unless + * lock_only_partitioned_children is specified, in which case, only the + * child relations that are partitioned tables are locked. If lockmode is + * NoLock then no locks are acquired, but caller must beware of race + * conditions against possible DROPs of child relations. + * + * Returned list of OIDs is such that all the partitioned tables in the tree + * appear at the head of the list. If num_partitioned_children is non-NULL, + * *num_partitioned_children returns the number of partitioned child table + * OIDs at the head of the list. */ List * -find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) +find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, + List **numparents, int *num_partitioned_children) { /* hash table for O(1) rel_oid -> rel_numparents cell lookup */ HTAB *seen_rels; HASHCTL ctl; List *rels_list, - *rel_numparents; + *rel_numparents, + *partitioned_rels_list, + *other_rels_list; ListCell *l; memset(&ctl, 0, sizeof(ctl)); @@ -185,31 +239,71 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) /* * We build a list starting with the given rel and adding all direct and - * indirect children. We can use a single list as both the record of - * already-found rels and the agenda of rels yet to be scanned for more - * children. This is a bit tricky but works because the foreach() macro - * doesn't fetch the next list element until the bottom of the loop. + * indirect children. We can use a single list (rels_list) as both the + * record of already-found rels and the agenda of rels yet to be scanned + * for more children. This is a bit tricky but works because the foreach() + * macro doesn't fetch the next list element until the bottom of the loop. + * + * partitioned_child_rels will contain the OIDs of the partitioned child + * tables and other_rels_list will contain the OIDs of the non-partitioned + * child tables. Result list will be generated by concatening the two + * lists together with partitioned_child_rels appearing first. */ rels_list = list_make1_oid(parentrelId); + partitioned_rels_list = list_make1_oid(parentrelId); + other_rels_list = NIL; rel_numparents = list_make1_int(0); + if (num_partitioned_children) + *num_partitioned_children = 0; + foreach(l, rels_list) { Oid currentrel = lfirst_oid(l); List *currentchildren; - ListCell *lc; + ListCell *lc, + *first_nonpartitioned_child; + int cur_num_partitioned_children = 0, + i; /* Get the direct children of this rel */ - currentchildren = find_inheritance_children(currentrel, lockmode); + currentchildren = find_inheritance_children(currentrel, lockmode, + &cur_num_partitioned_children); + + if (num_partitioned_children) + *num_partitioned_children += cur_num_partitioned_children; + + /* + * Append partitioned children to rels_list and partitioned_rels_list. + * We know for sure that partitioned children don't need the + * the de-duplication logic in the following loop, because partitioned + * tables are not allowed to partiticipate in multiple inheritance. + */ + i = 0; + foreach(lc, currentchildren) + { + if (i < cur_num_partitioned_children) + { + Oid child_oid = lfirst_oid(lc); + + rels_list = lappend_oid(rels_list, child_oid); + partitioned_rels_list = lappend_oid(partitioned_rels_list, + child_oid); + } + else + break; + i++; + } + first_nonpartitioned_child = lc; /* * Add to the queue only those children not already seen. This avoids * making duplicate entries in case of multiple inheritance paths from * the same parent. (It'll also keep us from getting into an infinite * loop, though theoretically there can't be any cycles in the - * inheritance graph anyway.) + * inheritance graph anyway.) Also, add them to the other_rels_list. */ - foreach(lc, currentchildren) + for_each_cell(lc, first_nonpartitioned_child) { Oid child_oid = lfirst_oid(lc); bool found; @@ -225,6 +319,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) { /* if it's not there, add it. expect 1 parent, initially. */ rels_list = lappend_oid(rels_list, child_oid); + other_rels_list = lappend_oid(other_rels_list, child_oid); rel_numparents = lappend_int(rel_numparents, 1); hash_entry->numparents_cell = rel_numparents->tail; } @@ -237,8 +332,10 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) list_free(rel_numparents); hash_destroy(seen_rels); + list_free(rels_list); - return rels_list; + /* List partitioned child tables before non-partitioned ones. */ + return list_concat(partitioned_rels_list, other_rels_list); } diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 2b638271b3..ae8ce71e1c 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1282,7 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel, * the children. */ tableOIDs = - find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL); + find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL, + NULL); /* * Check that there's at least one descendant, else fail. This could diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 9fe9e022b0..529f244f7e 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) List *children; ListCell *lc; - children = find_inheritance_children(reloid, NoLock); + children = find_inheritance_children(reloid, NoLock, NULL); foreach(lc, children) { diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 610cb499d2..64179ea3ef 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -516,7 +516,7 @@ OpenTableList(List *tables) List *children; children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock, - NULL); + NULL, NULL); foreach(child, children) { diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 513a9ec485..a35d7810f2 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -299,10 +299,10 @@ static bool MergeCheckConstraint(List *constraints, char *name, Node *expr); static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel); static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel); static void StoreCatalogInheritance(Oid relationId, List *supers, - bool child_is_partition); + bool child_is_partition, bool child_is_partitioned); static void StoreCatalogInheritance1(Oid relationId, Oid parentOid, int16 seqNumber, Relation inhRelation, - bool child_is_partition); + bool child_is_partition, bool child_is_partitioned); static int findAttrByName(const char *attributeName, List *schema); static void AlterIndexNamespaces(Relation classRel, Relation rel, Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved); @@ -746,7 +746,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, typaddress); /* Store inheritance information for new rel. */ - StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL); + StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL, + relkind == RELKIND_PARTITIONED_TABLE); /* * We must bump the command counter to make the newly-created relation @@ -1231,7 +1232,8 @@ ExecuteTruncate(TruncateStmt *stmt) ListCell *child; List *children; - children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL); + children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL, + NULL); foreach(child, children) { @@ -2297,7 +2299,7 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr) */ static void StoreCatalogInheritance(Oid relationId, List *supers, - bool child_is_partition) + bool child_is_partition, bool child_is_partitioned) { Relation relation; int16 seqNumber; @@ -2328,7 +2330,7 @@ StoreCatalogInheritance(Oid relationId, List *supers, Oid parentOid = lfirst_oid(entry); StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation, - child_is_partition); + child_is_partition, child_is_partitioned); seqNumber++; } @@ -2342,7 +2344,7 @@ StoreCatalogInheritance(Oid relationId, List *supers, static void StoreCatalogInheritance1(Oid relationId, Oid parentOid, int16 seqNumber, Relation inhRelation, - bool child_is_partition) + bool child_is_partition, bool child_is_partitioned) { TupleDesc desc = RelationGetDescr(inhRelation); Datum values[Natts_pg_inherits]; @@ -2357,6 +2359,8 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid, values[Anum_pg_inherits_inhrelid - 1] = ObjectIdGetDatum(relationId); values[Anum_pg_inherits_inhparent - 1] = ObjectIdGetDatum(parentOid); values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber); + values[Anum_pg_inherits_inhchildparted - 1] = + BoolGetDatum(child_is_partitioned); memset(nulls, 0, sizeof(nulls)); @@ -2556,7 +2560,7 @@ renameatt_internal(Oid myrelid, * outside the inheritance hierarchy being processed. */ child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, - &child_numparents); + &child_numparents, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -2583,7 +2587,7 @@ renameatt_internal(Oid myrelid, * expected_parents will only be 0 if we are not already recursing. */ if (expected_parents == 0 && - find_inheritance_children(myrelid, NoLock) != NIL) + find_inheritance_children(myrelid, NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("inherited column \"%s\" must be renamed in child tables too", @@ -2766,7 +2770,7 @@ rename_constraint_internal(Oid myrelid, *li; child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, - &child_numparents); + &child_numparents, NULL); forboth(lo, child_oids, li, child_numparents) { @@ -2782,7 +2786,7 @@ rename_constraint_internal(Oid myrelid, else { if (expected_parents == 0 && - find_inheritance_children(myrelid, NoLock) != NIL) + find_inheritance_children(myrelid, NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("inherited constraint \"%s\" must be renamed in child tables too", @@ -4790,7 +4794,7 @@ ATSimpleRecursion(List **wqueue, Relation rel, ListCell *child; List *children; - children = find_all_inheritors(relid, lockmode, NULL); + children = find_all_inheritors(relid, lockmode, NULL, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -5199,7 +5203,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, */ if (colDef->identity && recurse && - find_inheritance_children(myrelid, NoLock) != NIL) + find_inheritance_children(myrelid, NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("cannot recursively add identity column to table that has child tables"))); @@ -5405,7 +5409,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); /* * If we are told not to recurse, there had better not be any child @@ -6524,7 +6529,8 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); if (children) { @@ -6958,7 +6964,8 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); /* * Check if ONLY was specified with ALTER TABLE. If so, allow the @@ -7677,7 +7684,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse, */ if (!recursing && !con->connoinherit) children = find_all_inheritors(RelationGetRelid(rel), - lockmode, NULL); + lockmode, NULL, NULL); /* * For CHECK constraints, we must ensure that we only mark the @@ -8560,7 +8567,8 @@ ATExecDropConstraint(Relation rel, const char *constrName, * use find_all_inheritors to do it in one pass. */ if (!is_no_inherit_constraint) - children = find_inheritance_children(RelationGetRelid(rel), lockmode); + children = find_inheritance_children(RelationGetRelid(rel), lockmode, + NULL); else children = NIL; @@ -8849,7 +8857,7 @@ ATPrepAlterColumnType(List **wqueue, ListCell *child; List *children; - children = find_all_inheritors(relid, lockmode, NULL); + children = find_all_inheritors(relid, lockmode, NULL, NULL); /* * find_all_inheritors does the recursive search of the inheritance @@ -8900,7 +8908,8 @@ ATPrepAlterColumnType(List **wqueue, } } else if (!recursing && - find_inheritance_children(RelationGetRelid(rel), NoLock) != NIL) + find_inheritance_children(RelationGetRelid(rel), + NoLock, NULL) != NIL) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg("type of inherited column \"%s\" must be changed in child tables too", @@ -11010,7 +11019,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode) * We use weakest lock we can on child's children, namely AccessShareLock. */ children = find_all_inheritors(RelationGetRelid(child_rel), - AccessShareLock, NULL); + AccessShareLock, NULL, NULL); if (list_member_oid(children, RelationGetRelid(parent_rel))) ereport(ERROR, @@ -11119,6 +11128,8 @@ CreateInheritance(Relation child_rel, Relation parent_rel) inhseqno + 1, catalogRelation, parent_rel->rd_rel->relkind == + RELKIND_PARTITIONED_TABLE, + child_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); /* Now we're done with pg_inherits */ @@ -13516,7 +13527,8 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) * weaker lock now and the stronger one only when needed. */ attachrel_children = find_all_inheritors(RelationGetRelid(attachrel), - AccessExclusiveLock, NULL); + AccessExclusiveLock, NULL, + NULL); if (list_member_oid(attachrel_children, RelationGetRelid(rel))) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_TABLE), diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index faa181207a..e2e5ffce42 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -430,7 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel) oldcontext = MemoryContextSwitchTo(vac_context); if (include_parts) oid_list = list_concat(oid_list, - find_all_inheritors(relid, NoLock, NULL)); + find_all_inheritors(relid, NoLock, NULL, + NULL)); else oid_list = lappend_oid(oid_list, relid); MemoryContextSwitchTo(oldcontext); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index eeadd8bec5..3db8b6f971 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3256,7 +3256,7 @@ ExecSetupPartitionTupleRouting(Relation rel, * partitions. */ all_parts = find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, - NULL); + NULL, NULL); list_free(all_parts); *pd = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts); *num_partitions = list_length(leaf_parts); diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 6d8f8938b2..a59081103a 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1424,7 +1424,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) lockmode = AccessShareLock; /* Scan for all members of inheritance set, acquire needed locks */ - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL); /* * Check that there's at least one descendant, else treat as no-child diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h index 26bfab5db6..2c4ef246a4 100644 --- a/src/include/catalog/pg_inherits.h +++ b/src/include/catalog/pg_inherits.h @@ -33,6 +33,7 @@ CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS Oid inhrelid; Oid inhparent; int32 inhseqno; + bool inhchildparted; } FormData_pg_inherits; /* ---------------- @@ -46,10 +47,11 @@ typedef FormData_pg_inherits *Form_pg_inherits; * compiler constants for pg_inherits * ---------------- */ -#define Natts_pg_inherits 3 +#define Natts_pg_inherits 4 #define Anum_pg_inherits_inhrelid 1 #define Anum_pg_inherits_inhparent 2 #define Anum_pg_inherits_inhseqno 3 +#define Anum_pg_inherits_inhchildparted 4 /* ---------------- * pg_inherits has no initial contents diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h index 7743388899..8f371acae7 100644 --- a/src/include/catalog/pg_inherits_fn.h +++ b/src/include/catalog/pg_inherits_fn.h @@ -17,9 +17,10 @@ #include "nodes/pg_list.h" #include "storage/lock.h" -extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode); +extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode, + int *num_partitioned_children); extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, - List **parents); + List **parents, int *num_partitioned_children); extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId); -- 2.11.0
From 928eabebed8806f2ead413744ac196bb9caef646 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 9 Aug 2017 15:52:36 +0900 Subject: [PATCH 3/4] Teach expand_inherited_rtentry to use partition bound order After locking the child tables using find_all_inheritors, we discard the list of child table OIDs that it generates and rebuild the same using the information returned by RelationGetPartitionDispatchInfo. --- src/backend/optimizer/prep/prepunion.c | 51 ++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index a59081103a..734a7e55df 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -33,6 +33,7 @@ #include "access/heapam.h" #include "access/htup_details.h" #include "access/sysattr.h" +#include "catalog/partition.h" #include "catalog/pg_inherits_fn.h" #include "catalog/pg_type.h" #include "miscadmin.h" @@ -1452,6 +1453,56 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) */ oldrelation = heap_open(parentOID, NoLock); + /* + * For partitioned tables, we arrange the child table OIDs such that they + * appear in the partition bound order. + */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { + List *leaf_part_oids; + int num_parted, + i; + PartitionDispatch *pds; + + /* Discard the original list. */ + list_free(inhOIDs); + inhOIDs = NIL; + + /* Request partitioning information. */ + pds = RelationGetPartitionDispatchInfo(oldrelation, &num_parted, + &leaf_part_oids); + + /* + * First collect the partitioned child table OIDs, which includes the + * root parent at the head. + */ + for (i = 0; i < num_parted; i++) + { + PartitionDispatch pd = pds[i]; + + inhOIDs = lappend_oid(inhOIDs, RelationGetRelid(pd->reldesc)); + } + + /* Concatenate the leaf partition OIDs. */ + inhOIDs = list_concat(inhOIDs, leaf_part_oids); + + /* + * Release the resources that RelationGetPartitionDispatchInfo + * acquired for us but we don't really need in this case. Note that + * we don't touch the root partitioned table itself by starting the + * loop with 1, not 0. + */ + for (i = 1; i < num_parted; i++) + { + PartitionDispatch pd = pds[i]; + + heap_close(pd->reldesc, NoLock); + ExecDropSingleTupleTableSlot(pd->tupslot); + if (pd->tupmap) + pfree(pd->tupmap); + } + } + /* Scan the inheritance set and expand it */ appinfos = NIL; has_child = false; -- 2.11.0
From b2e3f1508534ddc49f192437b44810a6f0a0f1b4 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Mon, 24 Jul 2017 18:59:57 +0900 Subject: [PATCH 4/4] Decouple RelationGetPartitionDispatchInfo() from executor Currently it and the structure it generates viz. PartitionDispatch objects are too coupled with the executor's tuple-routing code. In particular, it's pretty undesirable that it makes it the responsibility of the caller to release some resources, such as relcache references and tuple table slots. That makes it harder to use in places other than where it's currently being used. After this refactoring, ExecSetupPartitionTupleRouting() now needs to do some of the work that was previously done in RelationGetPartitionDispatchInfo() and expand_inherited_rtentry() no longer needs to do some things that it used to. --- src/backend/catalog/partition.c | 309 +++++++++++++++++---------------- src/backend/commands/copy.c | 35 ++-- src/backend/executor/execMain.c | 146 ++++++++++++++-- src/backend/executor/nodeModifyTable.c | 29 ++-- src/backend/optimizer/prep/prepunion.c | 32 +--- src/include/catalog/partition.h | 52 +++--- src/include/executor/executor.h | 4 +- src/include/nodes/execnodes.h | 53 +++++- 8 files changed, 399 insertions(+), 261 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index efc025ec42..36f5c80b4f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -105,6 +105,24 @@ typedef struct PartitionRangeBound bool lower; /* this is the lower (vs upper) bound */ } PartitionRangeBound; +/*----------------------- + * PartitionDispatchData - information of partitions of one partitioned table + * in a partition tree + * + * partkey Partition key of the table + * partdesc Partition descriptor of the table + * indexes Array with partdesc->nparts members (for details on what the + * individual value represents, see the comments in + * RelationGetPartitionDispatchInfo()) + *----------------------- + */ +typedef struct PartitionDispatchData +{ + PartitionKey partkey; /* Points into the table's relcache entry */ + PartitionDesc partdesc; /* Ditto */ + int *indexes; +} PartitionDispatchData; + static int32 qsort_partition_list_value_cmp(const void *a, const void *b, void *arg); static int32 qsort_partition_rbound_cmp(const void *a, const void *b, @@ -981,181 +999,165 @@ get_partition_qual_relid(Oid relid) } /* - * Append OIDs of rel's partitions to the list 'partoids' and for each OID, - * append pointer rel to the list 'parents'. - */ -#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \ - do\ - {\ - int i;\ - for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\ - {\ - (partoids) = lappend_oid((partoids), (rel)->rd_partdesc->oids[i]);\ - (parents) = lappend((parents), (rel));\ - }\ - } while(0) - -/* * RelationGetPartitionDispatchInfo - * Returns information necessary to route tuples down a partition tree + * Returns necessary information for each partition in the partition + * tree rooted at rel * - * The number of elements in the returned array (that is, the number of - * PartitionDispatch objects for the partitioned tables in the partition tree) - * is returned in *num_parted and a list of the OIDs of all the leaf - * partitions of rel is returned in *leaf_part_oids. + * Information returned includes the following: *ptinfos contains a list of + * PartitionedTableInfo objects, one for each partitioned table (with at least + * one member, that is, one for the root partitioned table), *leaf_part_oids + * contains a list of the OIDs of of all the leaf partitions. * - * All the relations in the partition tree (including 'rel') must have been - * locked (using at least the AccessShareLock) by the caller. + * We require that the caller has locked at least the partitioned tables in the + * partition tree (including 'rel') using at least the AccessShareLock, + * because we need to look at their relcache entries to get PartitionKey and + * PartitionDesc. */ -PartitionDispatch * +void RelationGetPartitionDispatchInfo(Relation rel, - int *num_parted, List **leaf_part_oids) + List **ptinfos, List **leaf_part_oids) { - PartitionDispatchData **pd; - List *all_parts = NIL, - *all_parents = NIL, - *parted_rels, - *parted_rel_parents; + List *all_parts, + *all_parents; ListCell *lc1, *lc2; int i, - k, offset; /* * We rely on the relcache to traverse the partition tree to build both - * the leaf partition OIDs list and the array of PartitionDispatch objects - * for the partitioned tables in the tree. That means every partitioned - * table in the tree must be locked, which is fine since we require the - * caller to lock all the partitions anyway. + * the leaf partition OIDs list and the list of PartitionedTableInfo + * objects for partitioned tables. That means every partitioned table in + * the tree must be locked, which is fine since the callers must have done + * that already. * * For every partitioned table in the tree, starting with the root * partitioned table, add its relcache entry to parted_rels, while also * queuing its partitions (in the order in which they appear in the * partition descriptor) to be looked at later in the same loop. This is * a bit tricky but works because the foreach() macro doesn't fetch the - * next list element until the bottom of the loop. + * next list element until the bottom of the loop. Non-partitioned tables + * are simply added to the leaf partitions list. */ - *num_parted = 1; - parted_rels = list_make1(rel); - /* Root partitioned table has no parent, so NULL for parent */ - parted_rel_parents = list_make1(NULL); - APPEND_REL_PARTITION_OIDS(rel, all_parts, all_parents); + i = offset = 0; + *ptinfos = *leaf_part_oids = NIL; + + /* Start with the root table. */ + all_parts = list_make1_oid(RelationGetRelid(rel)); + all_parents = list_make1_oid(InvalidOid); forboth(lc1, all_parts, lc2, all_parents) { - Oid partrelid = lfirst_oid(lc1); - Relation parent = lfirst(lc2); + Oid partrelid = lfirst_oid(lc1); + Oid parentrelid = lfirst_oid(lc2); if (get_rel_relkind(partrelid) == RELKIND_PARTITIONED_TABLE) { - /* - * Already locked by the caller. Note that it is the - * responsibility of the caller to close the below relcache entry, - * once done using the information being collected here (for - * example, in ExecEndModifyTable). - */ - Relation partrel = heap_open(partrelid, NoLock); + int j, + k; + Relation partrel; + PartitionKey partkey; + PartitionDesc partdesc; + PartitionedTableInfo *ptinfo; + PartitionDispatch pd; + + if (partrelid != RelationGetRelid(rel)) + partrel = heap_open(partrelid, NoLock); + else + partrel = rel; - (*num_parted)++; - parted_rels = lappend(parted_rels, partrel); - parted_rel_parents = lappend(parted_rel_parents, parent); - APPEND_REL_PARTITION_OIDS(partrel, all_parts, all_parents); - } - } + partkey = RelationGetPartitionKey(partrel); + partdesc = RelationGetPartitionDesc(partrel); + + ptinfo = (PartitionedTableInfo *) + palloc0(sizeof(PartitionedTableInfo)); + ptinfo->relid = partrelid; + ptinfo->parentid = parentrelid; + + ptinfo->pd = pd = (PartitionDispatchData *) + palloc0(sizeof(PartitionDispatchData)); + pd->partkey = partkey; - /* - * We want to create two arrays - one for leaf partitions and another for - * partitioned tables (including the root table and internal partitions). - * While we only create the latter here, leaf partition array of suitable - * objects (such as, ResultRelInfo) is created by the caller using the - * list of OIDs we return. Indexes into these arrays get assigned in a - * breadth-first manner, whereby partitions of any given level are placed - * consecutively in the respective arrays. - */ - pd = (PartitionDispatchData **) palloc(*num_parted * - sizeof(PartitionDispatchData *)); - *leaf_part_oids = NIL; - i = k = offset = 0; - forboth(lc1, parted_rels, lc2, parted_rel_parents) - { - Relation partrel = lfirst(lc1); - Relation parent = lfirst(lc2); - PartitionKey partkey = RelationGetPartitionKey(partrel); - TupleDesc tupdesc = RelationGetDescr(partrel); - PartitionDesc partdesc = RelationGetPartitionDesc(partrel); - int j, - m; - - pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); - pd[i]->reldesc = partrel; - pd[i]->key = partkey; - pd[i]->keystate = NIL; - pd[i]->partdesc = partdesc; - if (parent != NULL) - { /* - * For every partitioned table other than root, we must store a - * tuple table slot initialized with its tuple descriptor and a - * tuple conversion map to convert a tuple from its parent's - * rowtype to its own. That is to make sure that we are looking at - * the correct row using the correct tuple descriptor when - * computing its partition key for tuple routing. + * XXX- do we need a pinning mechanism for partition descriptors + * so that there references can be managed independently of + * the parent relcache entry? Like PinPartitionDesc(partdesc)? */ - pd[i]->tupslot = MakeSingleTupleTableSlot(tupdesc); - pd[i]->tupmap = convert_tuples_by_name(RelationGetDescr(parent), - tupdesc, - gettext_noop("could not convert row type")); - } - else - { - /* Not required for the root partitioned table */ - pd[i]->tupslot = NULL; - pd[i]->tupmap = NULL; - } - pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int)); + pd->partdesc = partdesc; - /* - * Indexes corresponding to the internal partitions are multiplied by - * -1 to distinguish them from those of leaf partitions. Encountering - * an index >= 0 means we found a leaf partition, which is immediately - * returned as the partition we are looking for. A negative index - * means we found a partitioned table, whose PartitionDispatch object - * is located at the above index multiplied back by -1. Using the - * PartitionDispatch object, search is continued further down the - * partition tree. - */ - m = 0; - for (j = 0; j < partdesc->nparts; j++) - { - Oid partrelid = partdesc->oids[j]; + /* + * The values contained in the following array correspond to + * indexes of this table's partitions in the global sequence of + * all the partitions contained in the partition tree rooted at + * rel, traversed in a breadh-first manner. The values should be + * such that we will be able to distinguish the leaf partitions + * from the non-leaf partitions, because they are returned to + * to the caller in separate structures from where they will be + * accessed. The way that's done is described below: + * + * Leaf partition OIDs are put into the global leaf_part_oids list, + * and for each one, the value stored is its ordinal position in + * the list minus 1. + * + * PartitionedTableInfo objects corresponding to partitions that + * are partitioned tables are put into the global ptinfos[] list, + * and for each one, the value stored is its ordinal position in + * the list multiplied by -1. + * + * So while looking at the values in the indexes array, if one + * gets zero or a positive value, then it's a leaf partition, + * Otherwise, it's a partitioned table. + */ + pd->indexes = (int *) palloc(partdesc->nparts * sizeof(int)); - if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE) - { - *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid); - pd[i]->indexes[j] = k++; - } - else + k = 0; + for (j = 0; j < partdesc->nparts; j++) { + Oid partrelid = partdesc->oids[j]; + /* - * offset denotes the number of partitioned tables of upper - * levels including those of the current level. Any partition - * of this table must belong to the next level and hence will - * be placed after the last partitioned table of this level. + * Queue this partition so that it will be processed later + * by the outer loop. */ - pd[i]->indexes[j] = -(1 + offset + m); - m++; + all_parts = lappend_oid(all_parts, partrelid); + all_parents = lappend_oid(all_parents, + RelationGetRelid(partrel)); + + if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE) + { + *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid); + pd->indexes[j] = i++; + } + else + { + /* + * offset denotes the number of partitioned tables that + * we have already processed. k counts the number of + * partitions of this table that were found to be + * partitioned tables. + */ + pd->indexes[j] = -(1 + offset + k); + k++; + } } - } - i++; - /* - * This counts the number of partitioned tables at upper levels - * including those of the current level. - */ - offset += m; + offset += k; + + /* + * Release the relation descriptor. Lock that we have on the + * table will keep the PartitionDesc that is pointing into + * RelationData intact, a pointer to which hope to keep + * through this transaction's commit. + * (XXX - how true is that?) + */ + if (partrel != rel) + heap_close(partrel, NoLock); + + *ptinfos = lappend(*ptinfos, ptinfo); + } } - return pd; + Assert(i == list_length(*leaf_part_oids)); + Assert((offset + 1) == list_length(*ptinfos)); } /* Module-local functions */ @@ -1872,7 +1874,7 @@ generate_partition_qual(Relation rel) * ---------------- */ void -FormPartitionKeyDatum(PartitionDispatch pd, +FormPartitionKeyDatum(PartitionKeyInfo *keyinfo, TupleTableSlot *slot, EState *estate, Datum *values, @@ -1881,20 +1883,21 @@ FormPartitionKeyDatum(PartitionDispatch pd, ListCell *partexpr_item; int i; - if (pd->key->partexprs != NIL && pd->keystate == NIL) + if (keyinfo->key->partexprs != NIL && keyinfo->keystate == NIL) { /* Check caller has set up context correctly */ Assert(estate != NULL && GetPerTupleExprContext(estate)->ecxt_scantuple == slot); /* First time through, set up expression evaluation state */ - pd->keystate = ExecPrepareExprList(pd->key->partexprs, estate); + keyinfo->keystate = ExecPrepareExprList(keyinfo->key->partexprs, + estate); } - partexpr_item = list_head(pd->keystate); - for (i = 0; i < pd->key->partnatts; i++) + partexpr_item = list_head(keyinfo->keystate); + for (i = 0; i < keyinfo->key->partnatts; i++) { - AttrNumber keycol = pd->key->partattrs[i]; + AttrNumber keycol = keyinfo->key->partattrs[i]; Datum datum; bool isNull; @@ -1931,13 +1934,13 @@ FormPartitionKeyDatum(PartitionDispatch pd, * the latter case. */ int -get_partition_for_tuple(PartitionDispatch *pd, +get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos, TupleTableSlot *slot, EState *estate, - PartitionDispatchData **failed_at, + PartitionTupleRoutingInfo **failed_at, TupleTableSlot **failed_slot) { - PartitionDispatch parent; + PartitionTupleRoutingInfo *parent; Datum values[PARTITION_MAX_KEYS]; bool isnull[PARTITION_MAX_KEYS]; int cur_offset, @@ -1948,11 +1951,11 @@ get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple; /* start with the root partitioned table */ - parent = pd[0]; + parent = ptrinfos[0]; while (true) { - PartitionKey key = parent->key; - PartitionDesc partdesc = parent->partdesc; + PartitionKey key = parent->pd->partkey; + PartitionDesc partdesc = parent->pd->partdesc; TupleTableSlot *myslot = parent->tupslot; TupleConversionMap *map = parent->tupmap; @@ -1984,7 +1987,7 @@ get_partition_for_tuple(PartitionDispatch *pd, * So update ecxt_scantuple accordingly. */ ecxt->ecxt_scantuple = slot; - FormPartitionKeyDatum(parent, slot, estate, values, isnull); + FormPartitionKeyDatum(parent->keyinfo, slot, estate, values, isnull); if (key->strategy == PARTITION_STRATEGY_RANGE) { @@ -2055,13 +2058,13 @@ get_partition_for_tuple(PartitionDispatch *pd, *failed_slot = slot; break; } - else if (parent->indexes[cur_index] >= 0) + else if (parent->pd->indexes[cur_index] >= 0) { - result = parent->indexes[cur_index]; + result = parent->pd->indexes[cur_index]; break; } else - parent = pd[-parent->indexes[cur_index]]; + parent = ptrinfos[-parent->pd->indexes[cur_index]]; } error_exit: diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index a258965c20..e17a339349 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -165,8 +165,8 @@ typedef struct CopyStateData bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; - PartitionDispatch *partition_dispatch_info; - int num_dispatch; /* Number of entries in the above array */ + PartitionTupleRoutingInfo **ptrinfos; + int num_parted; /* Number of entries in the above array */ int num_partitions; /* Number of members in the following arrays */ ResultRelInfo *partitions; /* Per partition result relation */ TupleConversionMap **partition_tupconv_maps; @@ -1425,7 +1425,7 @@ BeginCopy(ParseState *pstate, /* Initialize state for CopyFrom tuple routing. */ if (is_from && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - PartitionDispatch *partition_dispatch_info; + PartitionTupleRoutingInfo **ptrinfos; ResultRelInfo *partitions; TupleConversionMap **partition_tupconv_maps; TupleTableSlot *partition_tuple_slot; @@ -1434,13 +1434,13 @@ BeginCopy(ParseState *pstate, ExecSetupPartitionTupleRouting(rel, 1, - &partition_dispatch_info, + &ptrinfos, &partitions, &partition_tupconv_maps, &partition_tuple_slot, &num_parted, &num_partitions); - cstate->partition_dispatch_info = partition_dispatch_info; - cstate->num_dispatch = num_parted; + cstate->ptrinfos = ptrinfos; + cstate->num_parted = num_parted; cstate->partitions = partitions; cstate->num_partitions = num_partitions; cstate->partition_tupconv_maps = partition_tupconv_maps; @@ -2495,7 +2495,7 @@ CopyFrom(CopyState cstate) if ((resultRelInfo->ri_TrigDesc != NULL && (resultRelInfo->ri_TrigDesc->trig_insert_before_row || resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) || - cstate->partition_dispatch_info != NULL || + cstate->ptrinfos != NULL || cstate->volatile_defexprs) { useHeapMultiInsert = false; @@ -2573,7 +2573,7 @@ CopyFrom(CopyState cstate) ExecStoreTuple(tuple, slot, InvalidBuffer, false); /* Determine the partition to heap_insert the tuple into */ - if (cstate->partition_dispatch_info) + if (cstate->ptrinfos) { int leaf_part_index; TupleConversionMap *map; @@ -2587,7 +2587,7 @@ CopyFrom(CopyState cstate) * partition, respectively. */ leaf_part_index = ExecFindPartition(resultRelInfo, - cstate->partition_dispatch_info, + cstate->ptrinfos, slot, estate); Assert(leaf_part_index >= 0 && @@ -2819,23 +2819,20 @@ CopyFrom(CopyState cstate) ExecCloseIndices(resultRelInfo); - /* Close all the partitioned tables, leaf partitions, and their indices */ - if (cstate->partition_dispatch_info) + /* Close all the leaf partitions and their indices */ + if (cstate->ptrinfos) { int i; /* - * Remember cstate->partition_dispatch_info[0] corresponds to the root - * partitioned table, which we must not try to close, because it is - * the main target table of COPY that will be closed eventually by - * DoCopy(). Also, tupslot is NULL for the root partitioned table. + * cstate->ptrinfo[0] corresponds to the root partitioned table, for + * which we didn't create tupslot. */ - for (i = 1; i < cstate->num_dispatch; i++) + for (i = 1; i < cstate->num_parted; i++) { - PartitionDispatch pd = cstate->partition_dispatch_info[i]; + PartitionTupleRoutingInfo *ptrinfo = cstate->ptrinfos[i]; - heap_close(pd->reldesc, NoLock); - ExecDropSingleTupleTableSlot(pd->tupslot); + ExecDropSingleTupleTableSlot(ptrinfo->tupslot); } for (i = 0; i < cstate->num_partitions; i++) { diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 3db8b6f971..790fd8f208 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3215,8 +3215,8 @@ EvalPlanQualEnd(EPQState *epqstate) * tuple routing for partitioned tables * * Output arguments: - * 'pd' receives an array of PartitionDispatch objects with one entry for - * every partitioned table in the partition tree + * 'ptrinfos' receives an array of PartitionTupleRoutingInfo objects with one + * entry for each partitioned table in the partition tree * 'partitions' receives an array of ResultRelInfo objects with one entry for * every leaf partition in the partition tree * 'tup_conv_maps' receives an array of TupleConversionMap objects with one @@ -3238,7 +3238,7 @@ EvalPlanQualEnd(EPQState *epqstate) void ExecSetupPartitionTupleRouting(Relation rel, Index resultRTindex, - PartitionDispatch **pd, + PartitionTupleRoutingInfo ***ptrinfos, ResultRelInfo **partitions, TupleConversionMap ***tup_conv_maps, TupleTableSlot **partition_tuple_slot, @@ -3246,10 +3246,12 @@ ExecSetupPartitionTupleRouting(Relation rel, { TupleDesc tupDesc = RelationGetDescr(rel); List *leaf_parts; + List *ptinfos = NIL; ListCell *cell; int i; ResultRelInfo *leaf_part_rri; List *all_parts; + Relation parent; /* * Get the information about the partition tree after locking all the @@ -3258,7 +3260,125 @@ ExecSetupPartitionTupleRouting(Relation rel, all_parts = find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL, NULL); list_free(all_parts); - *pd = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts); + + RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts); + + /* + * The ptinfos list contains PartitionedTableInfo objects for all the + * partitioned tables in the partition tree. Using the information + * therein, we construct an array of PartitionTupleRoutingInfo objects + * to be used during tuple-routing. + */ + *num_parted = list_length(ptinfos); + *ptrinfos = (PartitionTupleRoutingInfo **) palloc0(*num_parted * + sizeof(PartitionTupleRoutingInfo *)); + /* + * Free the ptinfos List structure itself as we go through (open-coded + * list_free). + */ + i = 0; + cell = list_head(ptinfos); + parent = NULL; + while (cell) + { + ListCell *tmp = cell; + PartitionedTableInfo *ptinfo = lfirst(tmp), + *next_ptinfo = NULL; + Relation partrel; + PartitionTupleRoutingInfo *ptrinfo; + + if (lnext(tmp)) + next_ptinfo = lfirst(lnext(tmp)); + + /* As mentioned above, the partitioned tables have been locked. */ + if (ptinfo->relid != RelationGetRelid(rel)) + partrel = heap_open(ptinfo->relid, NoLock); + else + partrel = rel; + + ptrinfo = (PartitionTupleRoutingInfo *) + palloc0(sizeof(PartitionTupleRoutingInfo)); + ptrinfo->relid = ptinfo->relid; + + /* Stash a reference to this PartitionDispatch. */ + ptrinfo->pd = ptinfo->pd; + + /* State for extracting partition key from tuples will go here. */ + ptrinfo->keyinfo = (PartitionKeyInfo *) + palloc0(sizeof(PartitionKeyInfo)); + ptrinfo->keyinfo->key = RelationGetPartitionKey(partrel); + ptrinfo->keyinfo->keystate = NIL; + + /* + * For every partitioned table other than root, we must store a tuple + * table slot initialized with its tuple descriptor and a tuple + * conversion map to convert a tuple from its parent's rowtype to its + * own. That is to make sure that we are looking at the correct row + * using the correct tuple descriptor when computing its partition key + * for tuple routing. + */ + if (ptinfo->parentid != InvalidOid) + { + TupleDesc tupdesc = RelationGetDescr(partrel); + + /* Open the parent relation descriptor if not already done. */ + if (ptinfo->parentid == RelationGetRelid(rel)) + { + parent = rel; + } + else if (parent == NULL) + { + /* Locked by RelationGetPartitionDispatchInfo(). */ + parent = heap_open(ptinfo->parentid, NoLock); + } + + ptrinfo->tupslot = MakeSingleTupleTableSlot(tupdesc); + ptrinfo->tupmap = convert_tuples_by_name(RelationGetDescr(parent), + tupdesc, + gettext_noop("could not convert row type")); + + /* + * Close the parent descriptor, if the next partitioned table in + * the list is not a sibling, because it will have a different + * parent if so. + */ + if (parent != NULL && parent != rel && + next_ptinfo != NULL && + next_ptinfo->parentid != ptinfo->parentid) + { + heap_close(parent, NoLock); + parent = NULL; + } + + /* + * Release the relation descriptor. Lock that we have on the + * table will keep the PartitionDesc that is pointing into + * RelationData intact, a pointer to which hope to keep + * through this transaction's commit. + * (XXX - how true is that?) + */ + if (partrel != rel) + heap_close(partrel, NoLock); + } + else + { + /* Not required for the root partitioned table */ + ptrinfo->tupslot = NULL; + ptrinfo->tupmap = NULL; + } + + (*ptrinfos)[i++] = ptrinfo; + + /* Free the ListCell. */ + cell = lnext(cell); + pfree(tmp); + } + + /* Free the List itself. */ + if (ptinfos) + pfree(ptinfos); + + /* For leaf partitions, we build ResultRelInfos and TupleConversionMaps. */ *num_partitions = list_length(leaf_parts); *partitions = (ResultRelInfo *) palloc(*num_partitions * sizeof(ResultRelInfo)); @@ -3284,7 +3404,7 @@ ExecSetupPartitionTupleRouting(Relation rel, * All the partitions were locked above. Note that the relcache * entries will be closed by ExecEndModifyTable(). */ - partrel = heap_open(lfirst_oid(cell), NoLock); + partrel = heap_open(lfirst_oid(cell), RowExclusiveLock); part_tupdesc = RelationGetDescr(partrel); /* @@ -3297,7 +3417,7 @@ ExecSetupPartitionTupleRouting(Relation rel, * partition from the parent's type to the partition's. */ (*tup_conv_maps)[i] = convert_tuples_by_name(tupDesc, part_tupdesc, - gettext_noop("could not convert row type")); + gettext_noop("could not convert row type")); InitResultRelInfo(leaf_part_rri, partrel, @@ -3331,11 +3451,13 @@ ExecSetupPartitionTupleRouting(Relation rel, * by get_partition_for_tuple() unchanged. */ int -ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, - TupleTableSlot *slot, EState *estate) +ExecFindPartition(ResultRelInfo *resultRelInfo, + PartitionTupleRoutingInfo **ptrinfos, + TupleTableSlot *slot, + EState *estate) { int result; - PartitionDispatchData *failed_at; + PartitionTupleRoutingInfo *failed_at; TupleTableSlot *failed_slot; /* @@ -3345,7 +3467,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, if (resultRelInfo->ri_PartitionCheck) ExecPartitionCheck(resultRelInfo, slot, estate); - result = get_partition_for_tuple(pd, slot, estate, + result = get_partition_for_tuple(ptrinfos, slot, estate, &failed_at, &failed_slot); if (result < 0) { @@ -3355,9 +3477,9 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, char *val_desc; ExprContext *ecxt = GetPerTupleExprContext(estate); - failed_rel = failed_at->reldesc; + failed_rel = heap_open(failed_at->relid, NoLock); ecxt->ecxt_scantuple = failed_slot; - FormPartitionKeyDatum(failed_at, failed_slot, estate, + FormPartitionKeyDatum(failed_at->keyinfo, failed_slot, estate, key_values, key_isnull); val_desc = ExecBuildSlotPartitionKeyDescription(failed_rel, key_values, diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 36b2b43bc6..9cf974c938 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -277,7 +277,7 @@ ExecInsert(ModifyTableState *mtstate, resultRelInfo = estate->es_result_relation_info; /* Determine the partition to heap_insert the tuple into */ - if (mtstate->mt_partition_dispatch_info) + if (mtstate->mt_ptrinfos) { int leaf_part_index; TupleConversionMap *map; @@ -291,7 +291,7 @@ ExecInsert(ModifyTableState *mtstate, * respectively. */ leaf_part_index = ExecFindPartition(resultRelInfo, - mtstate->mt_partition_dispatch_info, + mtstate->mt_ptrinfos, slot, estate); Assert(leaf_part_index >= 0 && @@ -1486,7 +1486,7 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate) int numResultRelInfos; /* Find the set of partitions so that we can find their TupleDescs. */ - if (mtstate->mt_partition_dispatch_info != NULL) + if (mtstate->mt_ptrinfos != NULL) { /* * For INSERT via partitioned table, so we need TupleDescs based @@ -1910,7 +1910,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) if (operation == CMD_INSERT && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - PartitionDispatch *partition_dispatch_info; + PartitionTupleRoutingInfo **ptrinfos; ResultRelInfo *partitions; TupleConversionMap **partition_tupconv_maps; TupleTableSlot *partition_tuple_slot; @@ -1919,13 +1919,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) ExecSetupPartitionTupleRouting(rel, node->nominalRelation, - &partition_dispatch_info, + &ptrinfos, &partitions, &partition_tupconv_maps, &partition_tuple_slot, &num_parted, &num_partitions); - mtstate->mt_partition_dispatch_info = partition_dispatch_info; - mtstate->mt_num_dispatch = num_parted; + mtstate->mt_ptrinfos = ptrinfos; + mtstate->mt_num_parted = num_parted; mtstate->mt_partitions = partitions; mtstate->mt_num_partitions = num_partitions; mtstate->mt_partition_tupconv_maps = partition_tupconv_maps; @@ -2335,19 +2335,16 @@ ExecEndModifyTable(ModifyTableState *node) } /* - * Close all the partitioned tables, leaf partitions, and their indices + * Close all the leaf partitions and their indices. * - * Remember node->mt_partition_dispatch_info[0] corresponds to the root - * partitioned table, which we must not try to close, because it is the - * main target table of the query that will be closed by ExecEndPlan(). - * Also, tupslot is NULL for the root partitioned table. + * node->mt_partition_dispatch_info[0] corresponds to the root partitioned + * table, for which we didn't create tupslot. */ - for (i = 1; i < node->mt_num_dispatch; i++) + for (i = 1; i < node->mt_num_parted; i++) { - PartitionDispatch pd = node->mt_partition_dispatch_info[i]; + PartitionTupleRoutingInfo *ptrinfo = node->mt_ptrinfos[i]; - heap_close(pd->reldesc, NoLock); - ExecDropSingleTupleTableSlot(pd->tupslot); + ExecDropSingleTupleTableSlot(ptrinfo->tupslot); } for (i = 0; i < node->mt_num_partitions; i++) { diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 734a7e55df..2d6f3900c3 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1459,48 +1459,30 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) */ if (rte->relkind == RELKIND_PARTITIONED_TABLE) { - List *leaf_part_oids; - int num_parted, - i; - PartitionDispatch *pds; + List *leaf_part_oids, + *ptinfos; /* Discard the original list. */ list_free(inhOIDs); inhOIDs = NIL; /* Request partitioning information. */ - pds = RelationGetPartitionDispatchInfo(oldrelation, &num_parted, - &leaf_part_oids); + RelationGetPartitionDispatchInfo(oldrelation, &ptinfos, + &leaf_part_oids); /* * First collect the partitioned child table OIDs, which includes the * root parent at the head. */ - for (i = 0; i < num_parted; i++) + foreach(l, ptinfos) { - PartitionDispatch pd = pds[i]; + PartitionedTableInfo *ptinfo = lfirst(l); - inhOIDs = lappend_oid(inhOIDs, RelationGetRelid(pd->reldesc)); + inhOIDs = lappend_oid(inhOIDs, ptinfo->relid); } /* Concatenate the leaf partition OIDs. */ inhOIDs = list_concat(inhOIDs, leaf_part_oids); - - /* - * Release the resources that RelationGetPartitionDispatchInfo - * acquired for us but we don't really need in this case. Note that - * we don't touch the root partitioned table itself by starting the - * loop with 1, not 0. - */ - for (i = 1; i < num_parted; i++) - { - PartitionDispatch pd = pds[i]; - - heap_close(pd->reldesc, NoLock); - ExecDropSingleTupleTableSlot(pd->tupslot); - if (pd->tupmap) - pfree(pd->tupmap); - } } /* Scan the inheritance set and expand it */ diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 2283c675e9..7b53baf847 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -39,36 +39,23 @@ typedef struct PartitionDescData typedef struct PartitionDescData *PartitionDesc; -/*----------------------- - * PartitionDispatch - information about one partitioned table in a partition - * hierarchy required to route a tuple to one of its partitions - * - * reldesc Relation descriptor of the table - * key Partition key information of the table - * keystate Execution state required for expressions in the partition key - * partdesc Partition descriptor of the table - * tupslot A standalone TupleTableSlot initialized with this table's tuple - * descriptor - * tupmap TupleConversionMap to convert from the parent's rowtype to - * this table's rowtype (when extracting the partition key of a - * tuple just before routing it through this table) - * indexes Array with partdesc->nparts members (for details on what - * individual members represent, see how they are set in - * RelationGetPartitionDispatchInfo()) - *----------------------- +typedef struct PartitionDispatchData *PartitionDispatch; + +/* + * Information about one partitioned table in a given partition tree */ -typedef struct PartitionDispatchData +typedef struct PartitionedTableInfo { - Relation reldesc; - PartitionKey key; - List *keystate; /* list of ExprState */ - PartitionDesc partdesc; - TupleTableSlot *tupslot; - TupleConversionMap *tupmap; - int *indexes; -} PartitionDispatchData; + Oid relid; + Oid parentid; -typedef struct PartitionDispatchData *PartitionDispatch; + /* + * This contains information about bounds of the partitions of this + * table and about where individual partitions are placed in the global + * partition tree. + */ + PartitionDispatch pd; +} PartitionedTableInfo; extern void RelationBuildPartitionDesc(Relation relation); extern bool partition_bounds_equal(int partnatts, int16 *parttyplen, @@ -86,17 +73,18 @@ extern List *map_partition_varattnos(List *expr, int target_varno, extern List *RelationGetPartitionQual(Relation rel); extern Expr *get_partition_qual_relid(Oid relid); +extern void RelationGetPartitionDispatchInfo(Relation rel, + List **ptinfos, List **leaf_part_oids); + /* For tuple routing */ -extern PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel, - int *num_parted, List **leaf_part_oids); -extern void FormPartitionKeyDatum(PartitionDispatch pd, +extern void FormPartitionKeyDatum(PartitionKeyInfo *keyinfo, TupleTableSlot *slot, EState *estate, Datum *values, bool *isnull); -extern int get_partition_for_tuple(PartitionDispatch *pd, +extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos, TupleTableSlot *slot, EState *estate, - PartitionDispatchData **failed_at, + PartitionTupleRoutingInfo **failed_at, TupleTableSlot **failed_slot); #endif /* PARTITION_H */ diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index 60326f9d03..6e1d3a6d2f 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -208,13 +208,13 @@ extern void EvalPlanQualSetTuple(EPQState *epqstate, Index rti, extern HeapTuple EvalPlanQualGetTuple(EPQState *epqstate, Index rti); extern void ExecSetupPartitionTupleRouting(Relation rel, Index resultRTindex, - PartitionDispatch **pd, + PartitionTupleRoutingInfo ***ptrinfos, ResultRelInfo **partitions, TupleConversionMap ***tup_conv_maps, TupleTableSlot **partition_tuple_slot, int *num_parted, int *num_partitions); extern int ExecFindPartition(ResultRelInfo *resultRelInfo, - PartitionDispatch *pd, + PartitionTupleRoutingInfo **ptrinfos, TupleTableSlot *slot, EState *estate); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 577499465d..07e50e0914 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -414,6 +414,55 @@ typedef struct ResultRelInfo Relation ri_PartitionRoot; } ResultRelInfo; +/* Forward declarations, to avoid including other headers */ +typedef struct PartitionKeyData *PartitionKey; +typedef struct PartitionDispatchData *PartitionDispatch; + +/* + * PartitionKeyInfoData - execution state for the partition key of a + * partitioned table + * + * keystate is the execution state required for expressions contained in the + * partition key. It is NIL until initialized by FormPartitionKeyDatum() if + * and when it is called; for example, during tuple routing through a given + * partitioned table. + */ +typedef struct PartitionKeyInfo +{ + PartitionKey key; /* Points into the table's relcache entry */ + List *keystate; +} PartitionKeyInfo; + +/* + * PartitionTupleRoutingInfo - information required for tuple-routing + * through one partitioned table in a partition + * tree + */ +typedef struct PartitionTupleRoutingInfo +{ + /* OID of the table */ + Oid relid; + + /* Information about the table's partitions */ + PartitionDispatch pd; + + /* See comment above the definition of PartitionKeyInfo */ + PartitionKeyInfo *keyinfo; + + /* + * A standalone TupleTableSlot initialized with this table's tuple + * descriptor + */ + TupleTableSlot *tupslot; + + /* + * TupleConversionMap to convert from the parent's rowtype to this table's + * rowtype (when extracting the partition key of a tuple just before + * routing it through this table) + */ + TupleConversionMap *tupmap; +} PartitionTupleRoutingInfo; + /* ---------------- * EState information * @@ -970,9 +1019,9 @@ typedef struct ModifyTableState TupleTableSlot *mt_existing; /* slot to store existing target tuple in */ List *mt_excludedtlist; /* the excluded pseudo relation's tlist */ TupleTableSlot *mt_conflproj; /* CONFLICT ... SET ... projection target */ - struct PartitionDispatchData **mt_partition_dispatch_info; /* Tuple-routing support info */ - int mt_num_dispatch; /* Number of entries in the above array */ + struct PartitionTupleRoutingInfo **mt_ptrinfos; + int mt_num_parted; /* Number of entries in the above array */ int mt_num_partitions; /* Number of members in the following * arrays */ ResultRelInfo *mt_partitions; /* Per partition result relation */ -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers