Hi David, Thanks for taking a look.
On 2018/07/15 17:34, David Rowley wrote: > I've looked over the code and the ExecUseUpdateResultRelForRouting() > function is broken. Your while loop only skips partitions for the > current partitioned table, it does not skip ModifyTable subnodes that > belong to other partitioned tables. > > You can use the following. The code does not find the t1_a2 subnode. > > create table t1 (a int, b int) partition by list(a); > create table t1_a1 partition of t1 for values in(1) partition by list(b); > create table t1_a2 partition of t1 for values in(2); > create table t1_a1_b1 partition of t1_a1 for values in(1); > create table t1_a1_b2 partition of t1_a1 for values in(2); > insert into t1 values(2,2); > > update t1 set a = a; Hmm, it indeed is broken. > I think there might not be enough information to make this work > correctly, as if you change the loop to skip subnodes, then it won't > work in cases where the partition[0] was pruned. > > I've another patch sitting here, partly done, that changes > pg_class.relispartition into pg_class.relpartitionparent. If we had > that then we could code your loop to work correctly.> Alternatively, > I guess we could just ignore the UPDATE's ResultRelInfos and just > build new ones. Unsure if there's actually a reason we need to reuse > the existing ones, is there? We try to use the existing ones because we thought back when the patch was written (not by me though) that redoing all the work that InitResultRelInfo does for each partition, for which we could have instead used an existing one, would cumulatively end up being more expensive than figuring out which ones we could reuse by a linear scan of partition and result rels arrays in parallel. I don't remember seeing a benchmark to demonstrate the benefit of doing this though. Maybe it was posted, but I don't remember having looked at it closely. > I think you'd need to know the owning partition and skip subnodes that > don't belong to pd->reldesc. Alternatively, a hashtable could be built > with all the oids belonging to pd->reldesc, then we could loop over > the update_rris finding subnodes that can be found in the hashtable. > Likely this will be much slower than the sort of merge lookup that the > previous code did. I think one option is to simply give up on the idea of matching *all* UPDATE result rels that belong to a given partitioned table (pd->reldesc) in one call of ExecUseUpdateResultRelForRouting. Instead, pass the index of the partition (in pd->partdesc->oids) to find the ResultRelInfo for, loop over all UPDATE result rels looking for one, and return immediately on finding one after having stored its pointer in proute->partitions. In the worst case, we'll end up scanning UPDATE result rels array for every partition that gets touched, but maybe such an UPDATE query is less common or even if such a query occurs, tuple routing might be the last of its bottlenecks. I have implemented that approach in the updated patch. That means I also needed to change things so that ExecUseUpdateResultRelsForRouting is now only called by ExecFindPartition, because with the new arrangement, it's useless to call it from ExecSetupPartitionTupleRouting. Moreover, an UPDATE may not use tuple routing at all, even if the fact that partition key is being updated results in calling ExecSetupPartitionTupleRouting. > Another thing that I don't like is the PARTITION_ROUTING_MAXSIZE code. > The code seems to assume that there can only be at the most 65536 > partitions, but I don't think there's any code which restricts us to > that. There is code in the planner that will bork when trying to > create a RangeTblEntry up that high, but as far as I know that won't > be noticed on the INSERT path. I don't think this code has any > business knowing what the special varnos are set to either. It would > be better to just remove the limit and suffer the small wasted array > space. I understand you've probably coded it like this due to the > similar code that was in my patch, but with mine I knew the total > number of partitions. Your patch does not. OK, I changed it to UINT_MAX. > Other thoughts on the patch: > > I wonder if it's worth having syscache keep a count on the number of > sub-partitioned tables a partition has. If there are none in the root > partition then the partition_dispatch_info can be initialized with > just 1 element to store the root details. Although, maybe it's not > worth it to reduce the array size by 7 elements. Hmm yes. Allocating space for 8 pointers when we really need 1 is not too bad, if the alternative is to modify partcache.c. > Also, I'm a bit confused why you change the comments in > execPartition.h for PartitionTupleRouting to be inline again. I > brought those out of line as I thought the complexity of the code > warranted that. You're inlining them again goes against what all the > other structs do in that file. It was out-of-line to begin with but it started to become distracting when updating the comments. But I agree about being consistent and hence I have moved them back to where they were. I have significantly rewritten those comments though to be clearer. > Apart from that, I think the idea is promising. We'll just need to > find a way to make ExecUseUpdateResultRelForRouting work correctly. Let me know what you think of the code in the updated patch. Thanks, Amit
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 25bec76c1d..44cf3bba12 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2621,10 +2621,8 @@ CopyFrom(CopyState cstate) * will get us the ResultRelInfo and TupleConversionMap for the * partition, respectively. */ - leaf_part_index = ExecFindPartition(resultRelInfo, - proute->partition_dispatch_info, - slot, - estate); + leaf_part_index = ExecFindPartition(mtstate, resultRelInfo, + proute, slot, estate); Assert(leaf_part_index >= 0 && leaf_part_index < proute->num_partitions); @@ -2644,10 +2642,8 @@ CopyFrom(CopyState cstate) * to the selected partition. */ saved_resultRelInfo = resultRelInfo; - resultRelInfo = ExecGetPartitionInfo(mtstate, - saved_resultRelInfo, - proute, estate, - leaf_part_index); + Assert(proute->partitions[leaf_part_index] != NULL); + resultRelInfo = proute->partitions[leaf_part_index]; /* * For ExecInsertIndexTuples() to work on the partition's indexes diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index 1a3a67dd0d..23c766b5fc 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -31,17 +31,19 @@ #include "utils/rls.h" #include "utils/ruleutils.h" -static ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate, +#define PARTITION_ROUTING_INITSIZE 8 +#define PARTITION_ROUTING_MAXSIZE UINT_MAX + +static int ExecUseUpdateResultRelForRouting(ModifyTableState *mtstate, + PartitionTupleRouting *proute, + PartitionDispatch pd, int partidx); +static int ExecInitPartitionInfo(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, PartitionTupleRouting *proute, - EState *estate, int partidx); -static PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel, - int *num_parted, Oid **leaf_part_oids, - int *n_leaf_part_oids); -static void get_partition_dispatch_recurse(Relation rel, Relation parent, - List **pds, Oid **leaf_part_oids, - int *n_leaf_part_oids, - int *leaf_part_oid_size); + EState *estate, + PartitionDispatch parent, int partidx); +static PartitionDispatch ExecInitPartitionDispatchInfo(PartitionTupleRouting *proute, + Oid partoid, PartitionDispatch parent_pd, int part_index); static void FormPartitionKeyDatum(PartitionDispatch pd, TupleTableSlot *slot, EState *estate, @@ -68,127 +70,61 @@ static void find_matching_subplans_recurse(PartitionPruneState *prunestate, * Note that all the relations in the partition tree are locked using the * RowExclusiveLock mode upon return from this function. * - * While we allocate the arrays of pointers of ResultRelInfo and - * TupleConversionMap for all partitions here, actual objects themselves are - * lazily allocated for a given partition if a tuple is actually routed to it; - * see ExecInitPartitionInfo. However, if the function is invoked for UPDATE - * tuple routing, the caller will have already initialized ResultRelInfo's for - * each partition present in the ModifyTable's subplans. These are reused and - * assigned to their respective slot in the aforementioned array. For such - * partitions, we delay setting up objects such as TupleConversionMap until - * those are actually chosen as the partitions to route tuples to. See - * ExecPrepareTupleRouting. + * This is called during the initialization of a COPY FROM command or of a + * INSERT/UPDATE query. We provisionally allocate space to hold + * PARTITION_ROUTING_INITSIZE number of PartitionDispatch and ResultRelInfo + * pointers in their respective arrays. The arrays will be doubled in + * size via repalloc (subject to the limit of PARTITION_ROUTING_MAXSIZE + * entries at most) if and when we run out of space, as more partitions need + * to be added. Since we already have the root parent open, its + * PartitionDispatch is created here. + * + * PartitionDispatch object of a non-root partitioned table or ResultRelInfo + * of a leaf partition is allocated and added to the respective array when + * it is encountered for the first time in ExecFindPartition. As mentioned + * above, we might need to expand the respective array before storing it. + * + * Tuple conversion maps (either child to parent and/or vice versa) and the + * array(s) to hold them are allocated only if needed. */ PartitionTupleRouting * ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) { - int i; PartitionTupleRouting *proute; - int nparts; ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL; - /* - * Get the information about the partition tree after locking all the - * partitions. - */ + /* Lock all the partitions. */ (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL); - proute = (PartitionTupleRouting *) palloc(sizeof(PartitionTupleRouting)); - proute->partition_dispatch_info = - RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch, - &proute->partition_oids, &nparts); - proute->num_partitions = nparts; - proute->partitions = - (ResultRelInfo **) palloc0(nparts * sizeof(ResultRelInfo *)); + proute = (PartitionTupleRouting *) palloc0(sizeof(PartitionTupleRouting)); + proute->partition_root = rel; + proute->dispatch_allocsize = PARTITION_ROUTING_INITSIZE; + proute->partition_dispatch_info = (PartitionDispatchData **) + palloc(sizeof(PartitionDispatchData) * PARTITION_ROUTING_INITSIZE); /* - * Allocate an array to store ResultRelInfos that we'll later allocate. - * It is common that not all partitions will have tuples routed to them, - * so we'll refrain from allocating enough space for all partitions here. - * Let's just start with something small and make it bigger only when - * needed. Storing these separately rather than relying on the - *'partitions' array allows us to quickly identify which ResultRelInfos we - * must teardown at the end. + * Initialize this table's PartitionDispatch object. Since the root + * parent doesn't itself have any parent, last two parameters are + * not used. */ - proute->partitions_init_size = Min(nparts, 8); - - proute->partitions_init = (ResultRelInfo **) - palloc(proute->partitions_init_size * sizeof(ResultRelInfo *)); - - proute->num_partitions_init = 0; - - /* We only allocate this when we need to store the first non-NULL map */ - proute->parent_child_tupconv_maps = NULL; - - proute->child_parent_tupconv_maps = NULL; - + (void) ExecInitPartitionDispatchInfo(proute, RelationGetRelid(rel), NULL, + 0); + proute->num_dispatch = 1; + proute->partitions_allocsize = PARTITION_ROUTING_INITSIZE; + proute->partitions = (ResultRelInfo **) + palloc(sizeof(ResultRelInfo *) * PARTITION_ROUTING_INITSIZE); + proute->num_partitions = 0; /* - * Initialize an empty slot that will be used to manipulate tuples of any - * given partition's rowtype. It is attached to the caller-specified node - * (such as ModifyTableState) and released when the node finishes - * processing. + * If UPDATE needs to do tuple routing, we'll need a slot that will + * transiently store the tuple being routed using the root parent's + * rowtype. We must set up at least this slot, because it's needed even + * before tuple routing begins. Other necessary information is + * initialized when tuple routing code calls + * ExecUseUpdateResultRelForRouting. */ - proute->partition_tuple_slot = MakeTupleTableSlot(NULL); - - /* Set up details specific to the type of tuple routing we are doing. */ if (node && node->operation == CMD_UPDATE) - { - ResultRelInfo *update_rri = NULL; - int num_update_rri = 0, - update_rri_index = 0; - - update_rri = mtstate->resultRelInfo; - num_update_rri = list_length(node->plans); - proute->subplan_partition_offsets = - palloc(num_update_rri * sizeof(int)); - proute->num_subplan_partition_offsets = num_update_rri; - proute->root_tuple_slot = MakeTupleTableSlot(NULL); - - for (i = 0; i < nparts; i++) - { - Oid leaf_oid = proute->partition_oids[i]; - - /* - * If the leaf partition is already present in the per-subplan - * result rels, we re-use that rather than initialize a new result - * rel. The per-subplan resultrels and the resultrels of the leaf - * partitions are both in the same canonical order. So while going - * through the leaf partition oids, we need to keep track of the - * next per-subplan result rel to be looked for in the leaf - * partition resultrels. - */ - if (update_rri_index < num_update_rri && - RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid) - { - ResultRelInfo *leaf_part_rri; - - leaf_part_rri = &update_rri[update_rri_index]; - - /* - * This is required in order to convert the partition's tuple - * to be compatible with the root partitioned table's tuple - * descriptor. When generating the per-subplan result rels, - * this was not set. - */ - leaf_part_rri->ri_PartitionRoot = rel; - - /* Remember the subplan offset for this ResultRelInfo */ - proute->subplan_partition_offsets[update_rri_index] = i; - - update_rri_index++; - - proute->partitions[i] = leaf_part_rri; - } - } - - /* - * We should have found all the per-subplan resultrels in the leaf - * partitions. - */ - Assert(update_rri_index == num_update_rri); - } else { proute->root_tuple_slot = NULL; @@ -196,26 +132,38 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) proute->num_subplan_partition_offsets = 0; } + /* We only allocate this when we need to store the first non-NULL map */ + proute->parent_child_tupconv_maps = NULL; + proute->child_parent_tupconv_maps = NULL; + + /* + * Initialize an empty slot that will be used to manipulate tuples of any + * given partition's rowtype. + */ + proute->partition_tuple_slot = MakeTupleTableSlot(NULL); + return proute; } /* - * ExecFindPartition -- Find a leaf partition in the partition tree rooted - * at parent, for the heap tuple contained in *slot + * ExecFindPartition -- Find a leaf partition for the tuple contained in *slot * * estate must be non-NULL; we'll need it to compute any expressions in the * partition key(s) * * If no leaf partition is found, this routine errors out with the appropriate - * error message, else it returns the leaf partition sequence number - * as an index into the array of (ResultRelInfos of) all leaf partitions in - * the partition tree. + * error message, else it returns the index of the leaf partition's + * ResultRelInfo in the proute->partitions array. */ int -ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, +ExecFindPartition(ModifyTableState *mtstate, + ResultRelInfo *resultRelInfo, + PartitionTupleRouting *proute, TupleTableSlot *slot, EState *estate) { - int result; + ModifyTable *node = (ModifyTable *) mtstate->ps.plan; + PartitionDispatch *pd = proute->partition_dispatch_info; + int result = -1; Datum values[PARTITION_MAX_KEYS]; bool isnull[PARTITION_MAX_KEYS]; Relation rel; @@ -272,10 +220,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, * partitions to begin with. */ if (partdesc->nparts == 0) - { - result = -1; break; - } cur_index = get_partition_for_tuple(rel, values, isnull); @@ -285,17 +230,64 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, * next parent to find a partition of. */ if (cur_index < 0) - { - result = -1; break; - } - else if (parent->indexes[cur_index] >= 0) + + if (partdesc->is_leaf[cur_index]) { - result = parent->indexes[cur_index]; + /* Get the ResultRelInfo of this leaf partition. */ + if (parent->indexes[cur_index] >= 0) + { + /* + * Already assigned (either created fresh or reused from the + * set of UPDATE result rels.) + */ + Assert(parent->indexes[cur_index] < proute->num_partitions); + result = parent->indexes[cur_index]; + } + else if (node && node->operation == CMD_UPDATE) + { + /* Try to assign an existing result rel for tuple routing. */ + result = ExecUseUpdateResultRelForRouting(mtstate, proute, + parent, cur_index); + + /* We may not really have found one. */ + Assert(result < 0 || + parent->indexes[cur_index] < proute->num_partitions); + } + + /* We need to create one afresh. */ + if (result < 0) + { + result = ExecInitPartitionInfo(mtstate, resultRelInfo, + proute, estate, + parent, cur_index); + Assert(result >= 0 && result < proute->num_partitions); + } break; } else - parent = pd[-parent->indexes[cur_index]]; + { + /* Get the PartitionDispatch of this parent. */ + if (parent->indexes[cur_index] >= 0) + { + /* Already allocated. */ + Assert(parent->indexes[cur_index] < proute->num_dispatch); + parent = pd[parent->indexes[cur_index]]; + } + else + { + /* Not yet, allocate one. */ + PartitionDispatch new_parent; + + new_parent = + ExecInitPartitionDispatchInfo(proute, + partdesc->oids[cur_index], + parent, cur_index); + Assert(parent->indexes[cur_index] >= 0 && + parent->indexes[cur_index] < proute->num_dispatch); + parent = new_parent; + } + } } /* A partition was not found. */ @@ -318,65 +310,110 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, } /* - * ExecGetPartitionInfo - * Fetch ResultRelInfo for partidx + * ExecUseUpdateResultRelForRouting + * Checks if any of the ResultRelInfo's created by ExecInitModifyTable + * belongs to the passed in partition, and if so, stores its pointer in + * in proute so that it can be used as the target of tuple routing * - * Sets up ResultRelInfo, if not done already. + * Return value is the index at which the found result rel is stored in proute + * or -1 if none found. */ -ResultRelInfo * -ExecGetPartitionInfo(ModifyTableState *mtstate, - ResultRelInfo *resultRelInfo, - PartitionTupleRouting *proute, - EState *estate, int partidx) +static int +ExecUseUpdateResultRelForRouting(ModifyTableState *mtstate, + PartitionTupleRouting *proute, + PartitionDispatch pd, + int partidx) { - ResultRelInfo *result = proute->partitions[partidx]; + Oid partoid = pd->partdesc->oids[partidx]; + ModifyTable *node = (ModifyTable *) mtstate->ps.plan; + ResultRelInfo *update_result_rels = NULL; + int num_update_result_rels = 0; + int i; + int part_result_rel_index = -1; - if (result) - return result; + update_result_rels = mtstate->resultRelInfo; + num_update_result_rels = list_length(node->plans); - result = ExecInitPartitionInfo(mtstate, - resultRelInfo, - proute, - estate, - partidx); - Assert(result); - - proute->partitions[partidx] = result; - - /* - * Record the ones setup so far in setup order. This makes the cleanup - * operation more efficient when very few have been setup. - */ - if (proute->num_partitions_init == proute->partitions_init_size) + /* If here for the first time, initialize necessary info in proute. */ + if (proute->subplan_partition_offsets == NULL) { - /* First allocate more space if the array is not large enough */ - proute->partitions_init_size = - Min(proute->partitions_init_size * 2, proute->num_partitions); - - proute->partitions_init = (ResultRelInfo **) - repalloc(proute->partitions_init, - proute->partitions_init_size * sizeof(ResultRelInfo *)); + proute->subplan_partition_offsets = + palloc(num_update_result_rels * sizeof(int)); + memset(proute->subplan_partition_offsets, -1, + num_update_result_rels * sizeof(int)); + proute->num_subplan_partition_offsets = num_update_result_rels; } - proute->partitions_init[proute->num_partitions_init++] = result; + /* + * Go through UPDATE result rels and save the pointers of those that + * belong to this table's partitions in proute. + */ + for (i = 0; i < num_update_result_rels; i++) + { + ResultRelInfo *update_result_rel = &update_result_rels[i]; - Assert(proute->num_partitions_init <= proute->num_partitions); + if (partoid != RelationGetRelid(update_result_rel->ri_RelationDesc)) + continue; - return result; + /* Found it. */ + + /* + * This is required in order to convert the partition's tuple + * to be compatible with the root partitioned table's tuple + * descriptor. When generating the per-subplan result rels, + * this was not set. + */ + update_result_rel->ri_PartitionRoot = proute->partition_root; + + /* + * Remember the index of this UPDATE result rel in the tuple + * routing partition array. + */ + proute->subplan_partition_offsets[i] = proute->num_partitions; + + /* + * Also, record in PartitionDispatch that we have a valid + * ResultRelInfo for this partition. + */ + Assert(pd->indexes[partidx] == -1); + part_result_rel_index = proute->num_partitions++; + if (part_result_rel_index >= PARTITION_ROUTING_MAXSIZE) + elog(ERROR, "invalid partition index: %u", part_result_rel_index); + pd->indexes[partidx] = part_result_rel_index; + if (part_result_rel_index >= proute->partitions_allocsize) + { + /* Expand allocated place. */ + proute->partitions_allocsize = + Min(proute->partitions_allocsize * 2, + PARTITION_ROUTING_MAXSIZE); + proute->partitions = (ResultRelInfo **) + repalloc(proute->partitions, + sizeof(ResultRelInfo *) * + proute->partitions_allocsize); + } + proute->partitions[part_result_rel_index] = update_result_rel; + break; + } + + return part_result_rel_index; } /* * ExecInitPartitionInfo * Initialize ResultRelInfo and other information for a partition * - * Returns the ResultRelInfo + * This also stores it in the proute->partitions array at the next + * available index, possibly expanding the array if there isn't any space + * left in it, and returns the index where it's stored. */ -static ResultRelInfo * +static int ExecInitPartitionInfo(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, PartitionTupleRouting *proute, - EState *estate, int partidx) + EState *estate, + PartitionDispatch parent, int partidx) { + Oid partoid = parent->partdesc->oids[partidx]; ModifyTable *node = (ModifyTable *) mtstate->ps.plan; Relation rootrel = resultRelInfo->ri_RelationDesc, partrel; @@ -385,12 +422,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, MemoryContext oldContext; AttrNumber *part_attnos = NULL; bool found_whole_row; + int part_result_rel_index; /* * We locked all the partitions in ExecSetupPartitionTupleRouting * including the leaf partitions. */ - partrel = heap_open(proute->partition_oids[partidx], NoLock); + partrel = heap_open(partoid, NoLock); /* * Keep ResultRelInfo and other information for this partition in the @@ -566,8 +604,23 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, &mtstate->ps, RelationGetDescr(partrel)); } + part_result_rel_index = proute->num_partitions++; + if (part_result_rel_index >= PARTITION_ROUTING_MAXSIZE) + elog(ERROR, "invalid partition index: %u", part_result_rel_index); + parent->indexes[partidx] = part_result_rel_index; + if (part_result_rel_index >= proute->partitions_allocsize) + { + /* Expand allocated place. */ + proute->partitions_allocsize = + Min(proute->partitions_allocsize * 2, PARTITION_ROUTING_MAXSIZE); + proute->partitions = (ResultRelInfo **) + repalloc(proute->partitions, + sizeof(ResultRelInfo *) * proute->partitions_allocsize); + } + /* Set up information needed for routing tuples to the partition. */ - ExecInitRoutingInfo(mtstate, estate, proute, leaf_part_rri, partidx); + ExecInitRoutingInfo(mtstate, estate, proute, leaf_part_rri, + part_result_rel_index); /* * If there is an ON CONFLICT clause, initialize state for it. @@ -626,7 +679,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, TupleConversionMap *map; map = proute->parent_child_tupconv_maps ? - proute->parent_child_tupconv_maps[partidx] : NULL; + proute->parent_child_tupconv_maps[part_result_rel_index] : + NULL; Assert(node->onConflictSet != NIL); Assert(resultRelInfo->ri_onConflict != NULL); @@ -729,12 +783,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, } } - Assert(proute->partitions[partidx] == NULL); - proute->partitions[partidx] = leaf_part_rri; + /* Save here for later use. */ + proute->partitions[part_result_rel_index] = leaf_part_rri; MemoryContextSwitchTo(oldContext); - return leaf_part_rri; + return part_result_rel_index; } /* @@ -766,10 +820,26 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, if (map) { + int new_size; + /* Allocate parent child map array only if we need to store a map */ - if (!proute->parent_child_tupconv_maps) + if (proute->parent_child_tupconv_maps == NULL) + { + proute->parent_child_tupconv_maps_allocsize = new_size = + PARTITION_ROUTING_INITSIZE; proute->parent_child_tupconv_maps = (TupleConversionMap **) - palloc0(proute->num_partitions * sizeof(TupleConversionMap *)); + palloc0(sizeof(TupleConversionMap *) * new_size); + } + /* We may have ran out of the initially allocated space. */ + else if (partidx >= proute->parent_child_tupconv_maps_allocsize) + { + proute->parent_child_tupconv_maps_allocsize = new_size = + Min(proute->parent_child_tupconv_maps_allocsize * 2, + PARTITION_ROUTING_MAXSIZE); + proute->parent_child_tupconv_maps = (TupleConversionMap **) + repalloc( proute->parent_child_tupconv_maps, + sizeof(TupleConversionMap *) * new_size); + } proute->parent_child_tupconv_maps[partidx] = map; } @@ -788,6 +858,91 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, } /* + * ExecInitPartitionDispatchInfo + * Initialize PartitionDispatch for a partitioned table + * + * This also stores it in the proute->partition_dispatch_info array at the + * specified index ('dispatchidx'), possibly expanding the array if there + * isn't space left in it. + */ +static PartitionDispatch +ExecInitPartitionDispatchInfo(PartitionTupleRouting *proute, Oid partoid, + PartitionDispatch parent_pd, int part_index) +{ + Relation rel; + TupleDesc tupdesc; + PartitionDesc partdesc; + PartitionKey partkey; + PartitionDispatch pd; + int dispatchidx; + + if (partoid != RelationGetRelid(proute->partition_root)) + rel = heap_open(partoid, NoLock); + else + rel = proute->partition_root; + tupdesc = RelationGetDescr(rel); + partdesc = RelationGetPartitionDesc(rel); + partkey = RelationGetPartitionKey(rel); + + pd = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); + pd->reldesc = rel; + pd->key = partkey; + pd->keystate = NIL; + pd->partdesc = partdesc; + if (parent_pd != NULL) + { + /* + * For every partitioned table other than the root, we must store a + * tuple table slot initialized with its tuple descriptor and a tuple + * conversion map to convert a tuple from its parent's rowtype to its + * own. That is to make sure that we are looking at the correct row + * using the correct tuple descriptor when computing its partition key + * for tuple routing. + */ + pd->tupslot = MakeSingleTupleTableSlot(tupdesc); + pd->tupmap = + convert_tuples_by_name(RelationGetDescr(parent_pd->reldesc), + tupdesc, + gettext_noop("could not convert row type")); + } + else + { + /* Not required for the root partitioned table */ + pd->tupslot = NULL; + pd->tupmap = NULL; + } + + pd->indexes = (int *) palloc(sizeof(int) * partdesc->nparts); + + /* + * Initialize with -1 to signify that the corresponding partition's + * ResultRelInfo or PartitionDispatch has not been created yet. + */ + memset(pd->indexes, -1, sizeof(int) * partdesc->nparts); + + dispatchidx = proute->num_dispatch++; + if (dispatchidx >= PARTITION_ROUTING_MAXSIZE) + elog(ERROR, "invalid partition index: %u", dispatchidx); + if (parent_pd) + parent_pd->indexes[part_index] = dispatchidx; + if (dispatchidx >= proute->dispatch_allocsize) + { + /* Expand allocated space. */ + proute->dispatch_allocsize = + Min(proute->dispatch_allocsize * 2, PARTITION_ROUTING_MAXSIZE); + proute->partition_dispatch_info = (PartitionDispatchData **) + repalloc(proute->partition_dispatch_info, + sizeof(PartitionDispatchData *) * + proute->dispatch_allocsize); + } + + /* Save here for later use. */ + proute->partition_dispatch_info[dispatchidx] = pd; + + return pd; +} + +/* * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition * child-to-root tuple conversion map array. * @@ -805,13 +960,14 @@ ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute) * These array elements get filled up with maps on an on-demand basis. * Initially just set all of them to NULL. */ + proute->child_parent_tupconv_maps_allocsize = PARTITION_ROUTING_INITSIZE; proute->child_parent_tupconv_maps = (TupleConversionMap **) palloc0(sizeof(TupleConversionMap *) * - proute->num_partitions); + PARTITION_ROUTING_INITSIZE); /* Same is the case for this array. All the values are set to false */ proute->child_parent_map_not_required = - (bool *) palloc0(sizeof(bool) * proute->num_partitions); + (bool *) palloc0(sizeof(bool) * PARTITION_ROUTING_INITSIZE); } /* @@ -826,8 +982,9 @@ TupConvMapForLeaf(PartitionTupleRouting *proute, TupleConversionMap **map; TupleDesc tupdesc; - /* Don't call this if we're not supposed to be using this type of map. */ - Assert(proute->child_parent_tupconv_maps != NULL); + /* If nobody else set up the per-leaf maps array, do so ourselves. */ + if (proute->child_parent_tupconv_maps == NULL) + ExecSetupChildParentMapForLeaf(proute); /* If it's already known that we don't need a map, return NULL. */ if (proute->child_parent_map_not_required[leaf_index]) @@ -846,6 +1003,30 @@ TupConvMapForLeaf(PartitionTupleRouting *proute, gettext_noop("could not convert row type")); /* If it turns out no map is needed, remember for next time. */ + + /* We may have run out of the initially allocated space. */ + if (leaf_index >= proute->child_parent_tupconv_maps_allocsize) + { + int new_size, + old_size; + + old_size = proute->child_parent_tupconv_maps_allocsize; + proute->child_parent_tupconv_maps_allocsize = new_size = + Min(proute->parent_child_tupconv_maps_allocsize * 2, + PARTITION_ROUTING_MAXSIZE); + proute->child_parent_tupconv_maps = (TupleConversionMap **) + repalloc(proute->child_parent_tupconv_maps, + sizeof(TupleConversionMap *) * new_size); + memset(proute->child_parent_tupconv_maps + old_size, 0, + sizeof(TupleConversionMap *) * (new_size - old_size)); + + proute->child_parent_map_not_required = (bool *) + repalloc(proute->child_parent_map_not_required, + sizeof(bool) * new_size); + memset(proute->child_parent_map_not_required + old_size, false, + sizeof(bool) * (new_size - old_size)); + } + proute->child_parent_map_not_required[leaf_index] = (*map == NULL); return *map; @@ -909,9 +1090,9 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate, ExecDropSingleTupleTableSlot(pd->tupslot); } - for (i = 0; i < proute->num_partitions_init; i++) + for (i = 0; i < proute->num_partitions; i++) { - ResultRelInfo *resultRelInfo = proute->partitions_init[i]; + ResultRelInfo *resultRelInfo = proute->partitions[i]; /* Allow any FDWs to shut down if they've been exercised */ if (resultRelInfo->ri_PartitionReadyForRouting && @@ -920,6 +1101,28 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate, resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state, resultRelInfo); + /* + * Check if this result rel is one of UPDATE subplan result rels, + * which if so, let ExecEndPlan() close it. + */ + if (proute->subplan_partition_offsets) + { + int j; + int found = false; + + for (j = 0; j < proute->num_subplan_partition_offsets; j++) + { + if (proute->subplan_partition_offsets[j] == i) + { + found = true; + break; + } + } + + if (found) + continue; + } + ExecCloseIndices(resultRelInfo); heap_close(resultRelInfo->ri_RelationDesc, NoLock); } @@ -931,211 +1134,6 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate, ExecDropSingleTupleTableSlot(proute->partition_tuple_slot); } -/* - * RelationGetPartitionDispatchInfo - * Returns an array of PartitionDispatch as is required for routing - * tuples to the correct partition. - * - * 'num_parted' is set to the size of the returned array and the - *'leaf_part_oids' array is allocated and populated with each leaf partition - * Oid in the hierarchy. 'n_leaf_part_oids' is set to the size of that array. - * All the relations in the partition tree (including 'rel') must have been - * locked (using at least the AccessShareLock) by the caller. - */ -static PartitionDispatch * -RelationGetPartitionDispatchInfo(Relation rel, - int *num_parted, Oid **leaf_part_oids, - int *n_leaf_part_oids) -{ - List *pdlist = NIL; - PartitionDispatchData **pd; - ListCell *lc; - int i; - int leaf_part_oid_size; - - Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); - - *num_parted = 0; - *n_leaf_part_oids = 0; - - leaf_part_oid_size = 0; - *leaf_part_oids = NULL; - - get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids, - n_leaf_part_oids, &leaf_part_oid_size); - *num_parted = list_length(pdlist); - pd = (PartitionDispatchData **) palloc(*num_parted * - sizeof(PartitionDispatchData *)); - i = 0; - foreach(lc, pdlist) - { - pd[i++] = lfirst(lc); - } - - return pd; -} - -/* - * get_partition_dispatch_recurse - * Recursively expand partition tree rooted at rel - * - * As the partition tree is expanded in a depth-first manner, we populate - * '*pds' with PartitionDispatch objects of each partitioned table we find, - * and populate leaf_part_oids with each leaf partition OID found. - * - * Note that the order of OIDs of leaf partitions in leaf_part_oids matches - * the order in which the planner's expand_partitioned_rtentry() processes - * them. It's not necessarily the case that the offsets match up exactly, - * because constraint exclusion might prune away some partitions on the - * planner side, whereas we'll always have the complete list; but unpruned - * partitions will appear in the same order in the plan as they are returned - * here. - * - * Note: Callers must not attempt to pfree the 'leaf_part_oids' array. - */ -static void -get_partition_dispatch_recurse(Relation rel, Relation parent, - List **pds, Oid **leaf_part_oids, - int *n_leaf_part_oids, - int *leaf_part_oid_size) -{ - TupleDesc tupdesc = RelationGetDescr(rel); - PartitionDesc partdesc = RelationGetPartitionDesc(rel); - PartitionKey partkey = RelationGetPartitionKey(rel); - PartitionDispatch pd; - int i; - int nparts; - int oid_array_used; - int oid_array_size; - Oid *oid_array; - Oid *partdesc_oids; - bool *partdesc_subpartitions; - int *indexes; - - check_stack_depth(); - - /* Build a PartitionDispatch for this table and add it to *pds. */ - pd = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); - *pds = lappend(*pds, pd); - pd->reldesc = rel; - pd->key = partkey; - pd->keystate = NIL; - pd->partdesc = partdesc; - if (parent != NULL) - { - /* - * For every partitioned table other than the root, we must store a - * tuple table slot initialized with its tuple descriptor and a tuple - * conversion map to convert a tuple from its parent's rowtype to its - * own. That is to make sure that we are looking at the correct row - * using the correct tuple descriptor when computing its partition key - * for tuple routing. - */ - pd->tupslot = MakeSingleTupleTableSlot(tupdesc); - pd->tupmap = convert_tuples_by_name(RelationGetDescr(parent), - tupdesc, - gettext_noop("could not convert row type")); - } - else - { - /* Not required for the root partitioned table */ - pd->tupslot = NULL; - pd->tupmap = NULL; - - /* - * If the parent has no sub partitions then we can skip calculating - * all the leaf partitions and just return all the oids at this level. - * In this case, the indexes were also pre-calculated for us by the - * syscache code. - */ - if (!partdesc->hassubpart) - { - *leaf_part_oids = partdesc->oids; - /* XXX or should we memcpy this out of syscache? */ - pd->indexes = partdesc->indexes; - *n_leaf_part_oids = partdesc->nparts; - return; - } - } - - /* - * Go look at each partition of this table. If it's a leaf partition, - * simply add its OID to *leaf_part_oids. If it's a partitioned table, - * recursively call get_partition_dispatch_recurse(), so that its - * partitions are processed as well and a corresponding PartitionDispatch - * object gets added to *pds. - * - * The 'indexes' array is used when searching for a partition matching a - * given tuple. The actual value we store here depends on whether the - * array element belongs to a leaf partition or a subpartitioned table. - * For leaf partitions we store the index into *leaf_part_oids, and for - * sub-partitioned tables we store a negative version of the index into - * the *pds list. Both indexes are 0-based, but the first element of the - * *pds list is the root partition, so 0 always means the first leaf. When - * searching, if we see a negative value, the search must continue in the - * corresponding sub-partition; otherwise, we've identified the correct - * partition. - */ - oid_array_used = *n_leaf_part_oids; - oid_array_size = *leaf_part_oid_size; - oid_array = *leaf_part_oids; - nparts = partdesc->nparts; - - if (!oid_array) - { - oid_array_size = *leaf_part_oid_size = nparts; - *leaf_part_oids = (Oid *) palloc(sizeof(Oid) * nparts); - oid_array = *leaf_part_oids; - } - - partdesc_oids = partdesc->oids; - partdesc_subpartitions = partdesc->subpartitions; - - pd->indexes = indexes = (int *) palloc(nparts * sizeof(int)); - - for (i = 0; i < nparts; i++) - { - Oid partrelid = partdesc_oids[i]; - - if (!partdesc_subpartitions[i]) - { - if (oid_array_size <= oid_array_used) - { - oid_array_size *= 2; - oid_array = (Oid *) repalloc(oid_array, - sizeof(Oid) * oid_array_size); - } - - oid_array[oid_array_used] = partrelid; - indexes[i] = oid_array_used++; - } - else - { - /* - * We assume all tables in the partition tree were already locked - * by the caller. - */ - Relation partrel = heap_open(partrelid, NoLock); - - *n_leaf_part_oids = oid_array_used; - *leaf_part_oid_size = oid_array_size; - *leaf_part_oids = oid_array; - - indexes[i] = -list_length(*pds); - get_partition_dispatch_recurse(partrel, rel, pds, leaf_part_oids, - n_leaf_part_oids, leaf_part_oid_size); - - oid_array_used = *n_leaf_part_oids; - oid_array_size = *leaf_part_oid_size; - oid_array = *leaf_part_oids; - } - } - - *n_leaf_part_oids = oid_array_used; - *leaf_part_oid_size = oid_array_size; - *leaf_part_oids = oid_array; -} - /* ---------------- * FormPartitionKeyDatum * Construct values[] and isnull[] arrays for the partition key diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 07b5f968aa..8b671c6426 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -68,7 +68,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate, ResultRelInfo *targetRelInfo, TupleTableSlot *slot); static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node); -static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate); static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate); static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node, int whichplan); @@ -1666,7 +1665,7 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate) if (mtstate->mt_transition_capture != NULL || mtstate->mt_oc_transition_capture != NULL) { - ExecSetupChildParentMapForTcs(mtstate); + ExecSetupChildParentMapForSubplan(mtstate); /* * Install the conversion map for the first plan for UPDATE and DELETE @@ -1709,15 +1708,12 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate, * value is to be used as an index into the arrays for the ResultRelInfo * and TupleConversionMap for the partition. */ - partidx = ExecFindPartition(targetRelInfo, - proute->partition_dispatch_info, - slot, - estate); + partidx = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate); Assert(partidx >= 0 && partidx < proute->num_partitions); /* Get the ResultRelInfo corresponding to the selected partition. */ - partrel = ExecGetPartitionInfo(mtstate, targetRelInfo, proute, estate, - partidx); + Assert(proute->partitions[partidx] != NULL); + partrel = proute->partitions[partidx]; /* * Check whether the partition is routable if we didn't yet @@ -1825,17 +1821,6 @@ ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate) int i; /* - * First check if there is already a per-subplan array allocated. Even if - * there is already a per-leaf map array, we won't require a per-subplan - * one, since we will use the subplan offset array to convert the subplan - * index to per-leaf index. - */ - if (mtstate->mt_per_subplan_tupconv_maps || - (mtstate->mt_partition_tuple_routing && - mtstate->mt_partition_tuple_routing->child_parent_tupconv_maps)) - return; - - /* * Build array of conversion maps from each child's TupleDesc to the one * used in the target relation. The map pointers may be NULL when no * conversion is necessary, which is hopefully a common case. @@ -1857,78 +1842,17 @@ ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate) } /* - * Initialize the child-to-root tuple conversion map array required for - * capturing transition tuples. - * - * The map array can be indexed either by subplan index or by leaf-partition - * index. For transition tables, we need a subplan-indexed access to the map, - * and where tuple-routing is present, we also require a leaf-indexed access. - */ -static void -ExecSetupChildParentMapForTcs(ModifyTableState *mtstate) -{ - PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; - - /* - * If partition tuple routing is set up, we will require partition-indexed - * access. In that case, create the map array indexed by partition; we - * will still be able to access the maps using a subplan index by - * converting the subplan index to a partition index using - * subplan_partition_offsets. If tuple routing is not set up, it means we - * don't require partition-indexed access. In that case, create just a - * subplan-indexed map. - */ - if (proute) - { - /* - * If a partition-indexed map array is to be created, the subplan map - * array has to be NULL. If the subplan map array is already created, - * we won't be able to access the map using a partition index. - */ - Assert(mtstate->mt_per_subplan_tupconv_maps == NULL); - - ExecSetupChildParentMapForLeaf(proute); - } - else - ExecSetupChildParentMapForSubplan(mtstate); -} - -/* * For a given subplan index, get the tuple conversion map. */ static TupleConversionMap * tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan) { - /* - * If a partition-index tuple conversion map array is allocated, we need - * to first get the index into the partition array. Exactly *one* of the - * two arrays is allocated. This is because if there is a partition array - * required, we don't require subplan-indexed array since we can translate - * subplan index into partition index. And, we create a subplan-indexed - * array *only* if partition-indexed array is not required. - */ + /* If nobody else set the per-subplan array of maps, do so ouselves. */ if (mtstate->mt_per_subplan_tupconv_maps == NULL) - { - int leaf_index; - PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; + ExecSetupChildParentMapForSubplan(mtstate); - /* - * If subplan-indexed array is NULL, things should have been arranged - * to convert the subplan index to partition index. - */ - Assert(proute && proute->subplan_partition_offsets != NULL && - whichplan < proute->num_subplan_partition_offsets); - - leaf_index = proute->subplan_partition_offsets[whichplan]; - - return TupConvMapForLeaf(proute, getTargetResultRelInfo(mtstate), - leaf_index); - } - else - { - Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans); - return mtstate->mt_per_subplan_tupconv_maps[whichplan]; - } + Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans); + return mtstate->mt_per_subplan_tupconv_maps[whichplan]; } /* ---------------------------------------------------------------- diff --git a/src/backend/utils/cache/partcache.c b/src/backend/utils/cache/partcache.c index b36b7366e5..aa82aa52eb 100644 --- a/src/backend/utils/cache/partcache.c +++ b/src/backend/utils/cache/partcache.c @@ -594,7 +594,7 @@ RelationBuildPartitionDesc(Relation rel) int next_index = 0; result->oids = (Oid *) palloc0(nparts * sizeof(Oid)); - result->subpartitions = (bool *) palloc(nparts * sizeof(bool)); + result->is_leaf = (bool *) palloc(nparts * sizeof(bool)); boundinfo = (PartitionBoundInfoData *) palloc0(sizeof(PartitionBoundInfoData)); @@ -775,7 +775,6 @@ RelationBuildPartitionDesc(Relation rel) } result->boundinfo = boundinfo; - result->hassubpart = false; /* unless we discover otherwise below */ /* * Now assign OIDs from the original array into mapped indexes of the @@ -786,33 +785,13 @@ RelationBuildPartitionDesc(Relation rel) for (i = 0; i < nparts; i++) { int index = mapping[i]; - bool subpart; result->oids[index] = oids[i]; - - subpart = (get_rel_relkind(oids[i]) == RELKIND_PARTITIONED_TABLE); /* Record if the partition is a subpartitioned table */ - result->subpartitions[index] = subpart; - result->hassubpart |= subpart; + result->is_leaf[index] = + (get_rel_relkind(oids[i]) != RELKIND_PARTITIONED_TABLE); } - /* - * If there are no subpartitions then we can pre-calculate the - * PartitionDispatch->indexes array. Doing this here saves quite a - * bit of overhead on simple queries which perform INSERTs or UPDATEs - * on partitioned tables with many partitions. The pre-calculation is - * very simple. All we need to store is a sequence of numbers from 0 - * to nparts - 1. - */ - if (!result->hassubpart) - { - result->indexes = (int *) palloc(nparts * sizeof(int)); - for (i = 0; i < nparts; i++) - result->indexes[i] = i; - } - else - result->indexes = NULL; - pfree(mapping); } diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index a8c69ff224..8d20469c98 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -26,18 +26,11 @@ typedef struct PartitionDescData { int nparts; /* Number of partitions */ - Oid *oids; /* OIDs array of 'nparts' of partitions in - * partbound order */ - int *indexes; /* Stores index for corresponding 'oids' - * element for use in tuple routing, or NULL - * if hassubpart is true. - */ - bool *subpartitions; /* Array of 'nparts' set to true if the - * corresponding 'oids' element belongs to a - * sub-partitioned table. - */ - bool hassubpart; /* true if any oid belongs to a - * sub-partitioned table */ + Oid *oids; /* Array of length 'nparts' containing + * partition OIDs in order of the their + * bounds */ + bool *is_leaf; /* Array of length 'nparts' containing whether + * a partition is a leaf partition */ PartitionBoundInfo boundinfo; /* collection of partition bounds */ } PartitionDescData; diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h index 822f66f5e2..91b840e12f 100644 --- a/src/include/executor/execPartition.h +++ b/src/include/executor/execPartition.h @@ -50,72 +50,124 @@ typedef struct PartitionDispatchData typedef struct PartitionDispatchData *PartitionDispatch; /*----------------------- - * PartitionTupleRouting - Encapsulates all information required to execute - * tuple-routing between partitions. + * PartitionTupleRouting - Encapsulates all information required to + * route a tuple inserted into a partitioned table to one of its leaf + * partitions * - * partition_dispatch_info Array of PartitionDispatch objects with one - * entry for every partitioned table in the - * partition tree. - * num_dispatch number of partitioned tables in the partition - * tree (= length of partition_dispatch_info[]) - * partition_oids Array of leaf partitions OIDs with one entry - * for every leaf partition in the partition tree, - * initialized in full by - * ExecSetupPartitionTupleRouting. - * partitions Array of ResultRelInfo* objects with one entry - * for every leaf partition in the partition tree, - * initialized lazily by ExecInitPartitionInfo. - * partitions_init Array of ResultRelInfo* objects in the order - * that they were lazily initialized. - * num_partitions Number of leaf partitions in the partition tree - * (= 'partitions_oid'/'partitions' array length) - * num_partitions_init Number of leaf partition lazily setup so far. - * partitions_init_size Size of partitions_init array. - * parent_child_tupconv_maps Array of TupleConversionMap objects with one - * entry for every leaf partition (required to - * convert tuple from the root table's rowtype to - * a leaf partition's rowtype after tuple routing - * is done). Remains NULL if no maps to store. - * child_parent_tupconv_maps Array of TupleConversionMap objects with one - * entry for every leaf partition (required to - * convert an updated tuple from the leaf - * partition's rowtype to the root table's rowtype - * so that tuple routing can be done) - * child_parent_map_not_required Array of bool. True value means that a map is - * determined to be not required for the given - * partition. False means either we haven't yet - * checked if a map is required, or it was - * determined to be required. - * subplan_partition_offsets Integer array ordered by UPDATE subplans. Each - * element of this array has the index into the - * corresponding partition in partitions array. - * num_subplan_partition_offsets Length of 'subplan_partition_offsets' array - * partition_tuple_slot TupleTableSlot to be used to manipulate any - * given leaf partition's rowtype after that - * partition is chosen for insertion by - * tuple-routing. - * root_tuple_slot TupleTableSlot to be used to transiently hold - * copy of a tuple that's being moved across - * partitions in the root partitioned table's - * rowtype + * partition_root Root table, that is, the table mentioned in the + * INSERT or UPDATE query or COPY FROM command. + * + * partition_dispatch_info Contains PartitionDispatch objects for every + * partitioned table touched by tuple routing. The + * entry for the root partitioned table is *always* + * present as the first entry of this array. + * + * num_dispatch The number of existing entries and also serves as + * the index of the next entry to be allocated and + * placed in 'partition_dispatch_info'. + * + * dispatch_allocsize (>= 'num_dispatch') is the number of entries that + * can be stored in 'partition_dispatch_info' before + * needing to reallocate more space. + * + * partitions Contains pointers to a ResultRelInfos of all leaf + * partitions touched by tuple routing. Some of + * these are pointers to "reused" ResultRelInfos, + * that is, those that are created and destroyed + * outside execPartition.c, for example, when tuple + * routing is used for UPDATE queries that modify + * the partition key. Rest of them are pointers to + * ResultRelInfos managed by execPartition.c itself + * + * num_partitions The number of existing entries and also serves as + * the index of the next entry to be allocated and + * placed in 'partitions' + * + * partitions_allocsize (>= 'num_partitions') is the number of entries + * that can be stored in 'partitions' before needing + * to reallocate more space + * + * parent_child_tupconv_maps Contains information to convert tuples of the + * root parent's rowtype to those of the leaf + * partitions' rowtype, but only for those partitions + * whose TupleDescs are physically different from the + * root parent's. If none of the partitions has such + * a differing TupleDesc, then it's NULL. If + * non-NULL, is of the same size as 'partitions', to + * be able to use the same array index. Also, there + * need not be more of these maps than there are + * partitions that were touched. + * + * parent_child_tupconv_maps_allocsize The number of entries that can be + * stored in 'parent_child_tupconv_maps' before + * needing to reallocate more space + * + * partition_tuple_slot This is a tuple slot used to store a tuple using + * rowtype of the the partition chosen by tuple + * routing. Maintained separately because partitions + * may have different rowtype. + * + * Note: The following fields are used only when UPDATE ends up needing to + * do tuple routing. + * + * child_parent_tupconv_maps Information to convert tuples of the leaf + * partitions' rowtype to the the root parent's + * rowtype. These are needed by transition table + * machinery when storing tuples of partition's + * rowtype into the transition table that can only + * store tuples of the root parent's rowtype. + * Like 'parent_child_tupconv_maps' it remains NULL + * if none of the partitions selected by tuple + * routing needed a conversion map. Also, if non- + * NULL, is of the same size as 'partitions'. + * + * child_parent_map_not_required Stores if we don't need a conversion + * map for a partition so that TupConvMapForLeaf + * can return quickly if set + * + * child_parent_tupconv_maps_allocsize The number of entries that can be + * stored in 'child_parent_tupconv_maps' before + * needing to reallocate more space + * + * subplan_partition_offsets The following maps indexes of UPDATE result + * rels in the per-subplan array to indexes of their + * pointers in the 'partitions' + * + * num_subplan_partition_offsets The number of entries in + * 'subplan_partition_offsets', which is same as the + * number of UPDATE result rels + * + * root_tuple_slot During UPDATE tuple routing, this tuple slot is + * used to transiently store a tuple using the root + * table's rowtype after converting it from the + * tuple's source leaf partition's rowtype. That is, + * if leaf partition's rowtype is different. *----------------------- */ typedef struct PartitionTupleRouting { + Relation partition_root; + PartitionDispatch *partition_dispatch_info; int num_dispatch; - Oid *partition_oids; + int dispatch_allocsize; + ResultRelInfo **partitions; - ResultRelInfo **partitions_init; int num_partitions; - int num_partitions_init; - int partitions_init_size; + int partitions_allocsize; + TupleConversionMap **parent_child_tupconv_maps; + int parent_child_tupconv_maps_allocsize; + + TupleTableSlot *partition_tuple_slot; + TupleConversionMap **child_parent_tupconv_maps; bool *child_parent_map_not_required; + int child_parent_tupconv_maps_allocsize; + int *subplan_partition_offsets; int num_subplan_partition_offsets; - TupleTableSlot *partition_tuple_slot; + TupleTableSlot *root_tuple_slot; } PartitionTupleRouting; @@ -193,8 +245,9 @@ typedef struct PartitionPruneState extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel); -extern int ExecFindPartition(ResultRelInfo *resultRelInfo, - PartitionDispatch *pd, +extern int ExecFindPartition(ModifyTableState *mtstate, + ResultRelInfo *resultRelInfo, + PartitionTupleRouting *proute, TupleTableSlot *slot, EState *estate); extern ResultRelInfo *ExecGetPartitionInfo(ModifyTableState *mtstate,