Hi David,

Thanks for taking a look.

On 2018/07/15 17:34, David Rowley wrote:
> I've looked over the code and the ExecUseUpdateResultRelForRouting()
> function is broken.  Your while loop only skips partitions for the
> current partitioned table, it does not skip ModifyTable subnodes that
> belong to other partitioned tables.
> 
> You can use the following. The code does not find the t1_a2 subnode.
> 
> create table t1 (a int, b int) partition by list(a);
> create table t1_a1 partition of t1 for values in(1) partition by list(b);
> create table t1_a2 partition of t1 for values in(2);
> create table t1_a1_b1 partition of t1_a1 for values in(1);
> create table t1_a1_b2 partition of t1_a1 for values in(2);
> insert into t1 values(2,2);
> 
> update t1 set a = a;

Hmm, it indeed is broken.

> I think there might not be enough information to make this work
> correctly, as if you change the loop to skip subnodes, then it won't
> work in cases where the partition[0] was pruned.
> 
> I've another patch sitting here, partly done, that changes
> pg_class.relispartition into pg_class.relpartitionparent.  If we had
> that then we could code your loop to work correctly.> Alternatively,
> I guess we could just ignore the UPDATE's ResultRelInfos and just
> build new ones. Unsure if there's actually a reason we need to reuse
> the existing ones, is there?

We try to use the existing ones because we thought back when the patch was
written (not by me though) that redoing all the work that
InitResultRelInfo does for each partition, for which we could have instead
used an existing one, would cumulatively end up being more expensive than
figuring out which ones we could reuse by a linear scan of partition and
result rels arrays in parallel.  I don't remember seeing a benchmark to
demonstrate the benefit of doing this though.  Maybe it was posted, but I
don't remember having looked at it closely.

> I think you'd need to know the owning partition and skip subnodes that
> don't belong to pd->reldesc. Alternatively, a hashtable could be built
> with all the oids belonging to pd->reldesc, then we could loop over
> the update_rris finding subnodes that can be found in the hashtable.
> Likely this will be much slower than the sort of merge lookup that the
> previous code did.

I think one option is to simply give up on the idea of matching *all*
UPDATE result rels that belong to a given partitioned table (pd->reldesc)
in one call of ExecUseUpdateResultRelForRouting.  Instead, pass the index
of the partition (in pd->partdesc->oids) to find the ResultRelInfo for,
loop over all UPDATE result rels looking for one, and return immediately
on finding one after having stored its pointer in proute->partitions.  In
the worst case, we'll end up scanning UPDATE result rels array for every
partition that gets touched, but maybe such an UPDATE query is less common
or even if such a query occurs, tuple routing might be the last of its
bottlenecks.

I have implemented that approach in the updated patch.

That means I also needed to change things so that
ExecUseUpdateResultRelsForRouting is now only called by ExecFindPartition,
because with the new arrangement, it's useless to call it from
ExecSetupPartitionTupleRouting.  Moreover, an UPDATE may not use tuple
routing at all, even if the fact that partition key is being updated
results in calling ExecSetupPartitionTupleRouting.

> Another thing that I don't like is the PARTITION_ROUTING_MAXSIZE code.
> The code seems to assume that there can only be at the most 65536
> partitions, but I don't think there's any code which restricts us to
> that. There is code in the planner that will bork when trying to
> create a RangeTblEntry up that high, but as far as I know that won't
> be noticed on the INSERT path.  I don't think this code has any
> business knowing what the special varnos are set to either.  It would
> be better to just remove the limit and suffer the small wasted array
> space.  I understand you've probably coded it like this due to the
> similar code that was in my patch, but with mine I knew the total
> number of partitions. Your patch does not.

OK, I changed it to UINT_MAX.

> Other thoughts on the patch:
> 
> I wonder if it's worth having syscache keep a count on the number of
> sub-partitioned tables a partition has. If there are none in the root
> partition then the partition_dispatch_info can be initialized with
> just 1 element to store the root details. Although, maybe it's not
> worth it to reduce the array size by 7 elements.

Hmm yes.  Allocating space for 8 pointers when we really need 1 is not too
bad, if the alternative is to modify partcache.c.

> Also, I'm a bit confused why you change the comments in
> execPartition.h for PartitionTupleRouting to be inline again. I
> brought those out of line as I thought the complexity of the code
> warranted that. You're inlining them again goes against what all the
> other structs do in that file.

It was out-of-line to begin with but it started to become distracting when
updating the comments.  But I agree about being consistent and hence I
have moved them back to where they were.  I have significantly rewritten
those comments though to be clearer.

> Apart from that, I think the idea is promising. We'll just need to
> find a way to make ExecUseUpdateResultRelForRouting work correctly.

Let me know what you think of the code in the updated patch.

Thanks,
Amit
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 25bec76c1d..44cf3bba12 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2621,10 +2621,8 @@ CopyFrom(CopyState cstate)
                         * will get us the ResultRelInfo and TupleConversionMap 
for the
                         * partition, respectively.
                         */
-                       leaf_part_index = ExecFindPartition(resultRelInfo,
-                                                                               
                proute->partition_dispatch_info,
-                                                                               
                slot,
-                                                                               
                estate);
+                       leaf_part_index = ExecFindPartition(mtstate, 
resultRelInfo,
+                                                                               
                proute, slot, estate);
                        Assert(leaf_part_index >= 0 &&
                                   leaf_part_index < proute->num_partitions);
 
@@ -2644,10 +2642,8 @@ CopyFrom(CopyState cstate)
                         * to the selected partition.
                         */
                        saved_resultRelInfo = resultRelInfo;
-                       resultRelInfo = ExecGetPartitionInfo(mtstate,
-                                                                               
                 saved_resultRelInfo,
-                                                                               
                 proute, estate,
-                                                                               
                 leaf_part_index);
+                       Assert(proute->partitions[leaf_part_index] != NULL);
+                       resultRelInfo = proute->partitions[leaf_part_index];
 
                        /*
                         * For ExecInsertIndexTuples() to work on the 
partition's indexes
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 1a3a67dd0d..23c766b5fc 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -31,17 +31,19 @@
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
 
-static ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+#define PARTITION_ROUTING_INITSIZE     8
+#define PARTITION_ROUTING_MAXSIZE      UINT_MAX
+
+static int ExecUseUpdateResultRelForRouting(ModifyTableState *mtstate,
+                                                                
PartitionTupleRouting *proute,
+                                                                
PartitionDispatch pd, int partidx);
+static int ExecInitPartitionInfo(ModifyTableState *mtstate,
                                          ResultRelInfo *resultRelInfo,
                                          PartitionTupleRouting *proute,
-                                         EState *estate, int partidx);
-static PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
-                                                                int 
*num_parted, Oid **leaf_part_oids,
-                                                                int 
*n_leaf_part_oids);
-static void get_partition_dispatch_recurse(Relation rel, Relation parent,
-                                                          List **pds, Oid 
**leaf_part_oids,
-                                                          int 
*n_leaf_part_oids,
-                                                          int 
*leaf_part_oid_size);
+                                         EState *estate,
+                                         PartitionDispatch parent, int 
partidx);
+static PartitionDispatch ExecInitPartitionDispatchInfo(PartitionTupleRouting 
*proute,
+                                               Oid partoid, PartitionDispatch 
parent_pd, int part_index);
 static void FormPartitionKeyDatum(PartitionDispatch pd,
                                          TupleTableSlot *slot,
                                          EState *estate,
@@ -68,127 +70,61 @@ static void 
find_matching_subplans_recurse(PartitionPruneState *prunestate,
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
  *
- * While we allocate the arrays of pointers of ResultRelInfo and
- * TupleConversionMap for all partitions here, actual objects themselves are
- * lazily allocated for a given partition if a tuple is actually routed to it;
- * see ExecInitPartitionInfo.  However, if the function is invoked for UPDATE
- * tuple routing, the caller will have already initialized ResultRelInfo's for
- * each partition present in the ModifyTable's subplans. These are reused and
- * assigned to their respective slot in the aforementioned array.  For such
- * partitions, we delay setting up objects such as TupleConversionMap until
- * those are actually chosen as the partitions to route tuples to.  See
- * ExecPrepareTupleRouting.
+ * This is called during the initialization of a COPY FROM command or of a
+ * INSERT/UPDATE query.  We provisionally allocate space to hold
+ * PARTITION_ROUTING_INITSIZE number of PartitionDispatch and ResultRelInfo
+ * pointers in their respective arrays.  The arrays will be doubled in
+ * size via repalloc (subject to the limit of PARTITION_ROUTING_MAXSIZE
+ * entries  at most) if and when we run out of space, as more partitions need
+ * to be added.  Since we already have the root parent open, its
+ * PartitionDispatch is created here.
+ *
+ * PartitionDispatch object of a non-root partitioned table or ResultRelInfo
+ * of a leaf partition is allocated and added to the respective array when
+ * it is encountered for the first time in ExecFindPartition.  As mentioned
+ * above, we might need to expand the respective array before storing it.
+ *
+ * Tuple conversion maps (either child to parent and/or vice versa) and the
+ * array(s) to hold them are allocated only if needed.
  */
 PartitionTupleRouting *
 ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
-       int                     i;
        PartitionTupleRouting *proute;
-       int                     nparts;
        ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
-       /*
-        * Get the information about the partition tree after locking all the
-        * partitions.
-        */
+       /* Lock all the partitions. */
        (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, 
NULL);
-       proute = (PartitionTupleRouting *) 
palloc(sizeof(PartitionTupleRouting));
-       proute->partition_dispatch_info =
-               RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
-                                                                               
 &proute->partition_oids, &nparts);
 
-       proute->num_partitions = nparts;
-       proute->partitions =
-               (ResultRelInfo **) palloc0(nparts * sizeof(ResultRelInfo *));
+       proute = (PartitionTupleRouting *) 
palloc0(sizeof(PartitionTupleRouting));
+       proute->partition_root = rel;
+       proute->dispatch_allocsize = PARTITION_ROUTING_INITSIZE;
+       proute->partition_dispatch_info = (PartitionDispatchData **)
+                       palloc(sizeof(PartitionDispatchData) * 
PARTITION_ROUTING_INITSIZE);
 
        /*
-        * Allocate an array to store ResultRelInfos that we'll later allocate.
-        * It is common that not all partitions will have tuples routed to them,
-        * so we'll refrain from allocating enough space for all partitions 
here.
-        * Let's just start with something small and make it bigger only when
-        * needed.  Storing these separately rather than relying on the
-        *'partitions' array allows us to quickly identify which ResultRelInfos 
we
-        * must teardown at the end.
+        * Initialize this table's PartitionDispatch object.  Since the root
+        * parent doesn't itself have any parent, last two parameters are
+        * not used.
         */
-       proute->partitions_init_size = Min(nparts, 8);
-
-       proute->partitions_init = (ResultRelInfo **)
-               palloc(proute->partitions_init_size * sizeof(ResultRelInfo *));
-
-       proute->num_partitions_init = 0;
-
-       /* We only allocate this when we need to store the first non-NULL map */
-       proute->parent_child_tupconv_maps = NULL;
-
-       proute->child_parent_tupconv_maps = NULL;
-
+       (void) ExecInitPartitionDispatchInfo(proute, RelationGetRelid(rel), 
NULL,
+                                                                               
 0);
+       proute->num_dispatch = 1;
+       proute->partitions_allocsize = PARTITION_ROUTING_INITSIZE;
+       proute->partitions = (ResultRelInfo **)
+                       palloc(sizeof(ResultRelInfo *) * 
PARTITION_ROUTING_INITSIZE);
+       proute->num_partitions = 0;
 
        /*
-        * Initialize an empty slot that will be used to manipulate tuples of 
any
-        * given partition's rowtype.  It is attached to the caller-specified 
node
-        * (such as ModifyTableState) and released when the node finishes
-        * processing.
+        * If UPDATE needs to do tuple routing, we'll need a slot that will
+        * transiently store the tuple being routed using the root parent's
+        * rowtype.  We must set up at least this slot, because it's needed even
+        * before tuple routing begins.  Other necessary information is
+        * initialized when  tuple routing code calls
+        * ExecUseUpdateResultRelForRouting.
         */
-       proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
-
-       /* Set up details specific to the type of tuple routing we are doing. */
        if (node && node->operation == CMD_UPDATE)
-       {
-               ResultRelInfo *update_rri = NULL;
-               int                     num_update_rri = 0,
-                                       update_rri_index = 0;
-
-               update_rri = mtstate->resultRelInfo;
-               num_update_rri = list_length(node->plans);
-               proute->subplan_partition_offsets =
-                       palloc(num_update_rri * sizeof(int));
-               proute->num_subplan_partition_offsets = num_update_rri;
-
                proute->root_tuple_slot = MakeTupleTableSlot(NULL);
-
-               for (i = 0; i < nparts; i++)
-               {
-                       Oid                     leaf_oid = 
proute->partition_oids[i];
-
-                       /*
-                        * If the leaf partition is already present in the 
per-subplan
-                        * result rels, we re-use that rather than initialize a 
new result
-                        * rel. The per-subplan resultrels and the resultrels 
of the leaf
-                        * partitions are both in the same canonical order. So 
while going
-                        * through the leaf partition oids, we need to keep 
track of the
-                        * next per-subplan result rel to be looked for in the 
leaf
-                        * partition resultrels.
-                        */
-                       if (update_rri_index < num_update_rri &&
-                               
RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
-                       {
-                               ResultRelInfo *leaf_part_rri;
-
-                               leaf_part_rri = &update_rri[update_rri_index];
-
-                               /*
-                                * This is required in order to convert the 
partition's tuple
-                                * to be compatible with the root partitioned 
table's tuple
-                                * descriptor.  When generating the per-subplan 
result rels,
-                                * this was not set.
-                                */
-                               leaf_part_rri->ri_PartitionRoot = rel;
-
-                               /* Remember the subplan offset for this 
ResultRelInfo */
-                               
proute->subplan_partition_offsets[update_rri_index] = i;
-
-                               update_rri_index++;
-
-                               proute->partitions[i] = leaf_part_rri;
-                       }
-               }
-
-               /*
-                * We should have found all the per-subplan resultrels in the 
leaf
-                * partitions.
-                */
-               Assert(update_rri_index == num_update_rri);
-       }
        else
        {
                proute->root_tuple_slot = NULL;
@@ -196,26 +132,38 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, 
Relation rel)
                proute->num_subplan_partition_offsets = 0;
        }
 
+       /* We only allocate this when we need to store the first non-NULL map */
+       proute->parent_child_tupconv_maps = NULL;
+       proute->child_parent_tupconv_maps = NULL;
+
+       /*
+        * Initialize an empty slot that will be used to manipulate tuples of 
any
+        * given partition's rowtype.
+        */
+       proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
+
        return proute;
 }
 
 /*
- * ExecFindPartition -- Find a leaf partition in the partition tree rooted
- * at parent, for the heap tuple contained in *slot
+ * ExecFindPartition -- Find a leaf partition for the tuple contained in *slot
  *
  * estate must be non-NULL; we'll need it to compute any expressions in the
  * partition key(s)
  *
  * If no leaf partition is found, this routine errors out with the appropriate
- * error message, else it returns the leaf partition sequence number
- * as an index into the array of (ResultRelInfos of) all leaf partitions in
- * the partition tree.
+ * error message, else it returns the index of the leaf partition's
+ * ResultRelInfo in the proute->partitions array.
  */
 int
-ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
+ExecFindPartition(ModifyTableState *mtstate,
+                                 ResultRelInfo *resultRelInfo,
+                                 PartitionTupleRouting *proute,
                                  TupleTableSlot *slot, EState *estate)
 {
-       int                     result;
+       ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+       PartitionDispatch *pd = proute->partition_dispatch_info;
+       int                     result = -1;
        Datum           values[PARTITION_MAX_KEYS];
        bool            isnull[PARTITION_MAX_KEYS];
        Relation        rel;
@@ -272,10 +220,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, 
PartitionDispatch *pd,
                 * partitions to begin with.
                 */
                if (partdesc->nparts == 0)
-               {
-                       result = -1;
                        break;
-               }
 
                cur_index = get_partition_for_tuple(rel, values, isnull);
 
@@ -285,17 +230,64 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, 
PartitionDispatch *pd,
                 * next parent to find a partition of.
                 */
                if (cur_index < 0)
-               {
-                       result = -1;
                        break;
-               }
-               else if (parent->indexes[cur_index] >= 0)
+
+               if (partdesc->is_leaf[cur_index])
                {
-                       result = parent->indexes[cur_index];
+                       /* Get the ResultRelInfo of this leaf partition. */
+                       if (parent->indexes[cur_index] >= 0)
+                       {
+                               /*
+                                * Already assigned (either created fresh or 
reused from the
+                                * set of UPDATE result rels.)
+                                */
+                               Assert(parent->indexes[cur_index] < 
proute->num_partitions);
+                               result = parent->indexes[cur_index];
+                       }
+                       else if (node && node->operation == CMD_UPDATE)
+                       {
+                               /* Try to assign an existing result rel for 
tuple routing. */
+                               result = 
ExecUseUpdateResultRelForRouting(mtstate, proute,
+                                                                               
                                  parent, cur_index);
+
+                               /* We may not really have found one. */
+                               Assert(result < 0 ||
+                                          parent->indexes[cur_index] < 
proute->num_partitions);
+                       }
+
+                       /* We need to create one afresh. */
+                       if (result < 0)
+                       {
+                               result = ExecInitPartitionInfo(mtstate, 
resultRelInfo,
+                                                                               
           proute, estate,
+                                                                               
           parent, cur_index);
+                               Assert(result >= 0 && result < 
proute->num_partitions);
+                       }
                        break;
                }
                else
-                       parent = pd[-parent->indexes[cur_index]];
+               {
+                       /* Get the PartitionDispatch of this parent. */
+                       if (parent->indexes[cur_index] >= 0)
+                       {
+                               /* Already allocated. */
+                               Assert(parent->indexes[cur_index] < 
proute->num_dispatch);
+                               parent = pd[parent->indexes[cur_index]];
+                       }
+                       else
+                       {
+                               /* Not yet, allocate one. */
+                               PartitionDispatch new_parent;
+
+                               new_parent =
+                                       ExecInitPartitionDispatchInfo(proute,
+                                                                               
                  partdesc->oids[cur_index],
+                                                                               
                  parent, cur_index);
+                               Assert(parent->indexes[cur_index] >= 0 &&
+                                          parent->indexes[cur_index] < 
proute->num_dispatch);
+                               parent = new_parent;
+                       }
+               }
        }
 
        /* A partition was not found. */
@@ -318,65 +310,110 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, 
PartitionDispatch *pd,
 }
 
 /*
- * ExecGetPartitionInfo
- *             Fetch ResultRelInfo for partidx
+ * ExecUseUpdateResultRelForRouting
+ *             Checks if any of the ResultRelInfo's created by 
ExecInitModifyTable
+ *             belongs to the passed in partition, and if so, stores its 
pointer in
+ *             in proute so that it can be used as the target of tuple routing
  *
- * Sets up ResultRelInfo, if not done already.
+ * Return value is the index at which the found result rel is stored in proute
+ * or -1 if none found.
  */
-ResultRelInfo *
-ExecGetPartitionInfo(ModifyTableState *mtstate,
-                                        ResultRelInfo *resultRelInfo,
-                                        PartitionTupleRouting *proute,
-                                        EState *estate, int partidx)
+static int
+ExecUseUpdateResultRelForRouting(ModifyTableState *mtstate,
+                                                                
PartitionTupleRouting *proute,
+                                                                
PartitionDispatch pd,
+                                                                int partidx)
 {
-       ResultRelInfo *result = proute->partitions[partidx];
+       Oid                             partoid = pd->partdesc->oids[partidx];
+       ModifyTable        *node = (ModifyTable *) mtstate->ps.plan;
+       ResultRelInfo  *update_result_rels = NULL;
+       int                             num_update_result_rels = 0;
+       int                             i;
+       int                             part_result_rel_index = -1;
 
-       if (result)
-               return result;
+       update_result_rels = mtstate->resultRelInfo;
+       num_update_result_rels = list_length(node->plans);
 
-       result = ExecInitPartitionInfo(mtstate,
-                                                                  
resultRelInfo,
-                                                                  proute,
-                                                                  estate,
-                                                                  partidx);
-       Assert(result);
-
-       proute->partitions[partidx] = result;
-
-       /*
-        * Record the ones setup so far in setup order.  This makes the cleanup
-        * operation more efficient when very few have been setup.
-        */
-       if (proute->num_partitions_init == proute->partitions_init_size)
+       /* If here for the first time, initialize necessary info in proute. */
+       if (proute->subplan_partition_offsets == NULL)
        {
-               /* First allocate more space if the array is not large enough */
-               proute->partitions_init_size =
-                       Min(proute->partitions_init_size * 2, 
proute->num_partitions);
-
-               proute->partitions_init = (ResultRelInfo **)
-                               repalloc(proute->partitions_init,
-                               proute->partitions_init_size * 
sizeof(ResultRelInfo *));
+               proute->subplan_partition_offsets =
+                               palloc(num_update_result_rels * sizeof(int));
+               memset(proute->subplan_partition_offsets, -1,
+                          num_update_result_rels * sizeof(int));
+               proute->num_subplan_partition_offsets = num_update_result_rels;
        }
 
-       proute->partitions_init[proute->num_partitions_init++] = result;
+       /*
+        * Go through UPDATE result rels and save the pointers of those that
+        * belong to this table's partitions in proute.
+        */
+       for (i = 0; i < num_update_result_rels; i++)
+       {
+               ResultRelInfo *update_result_rel = &update_result_rels[i];
 
-       Assert(proute->num_partitions_init <= proute->num_partitions);
+               if (partoid != 
RelationGetRelid(update_result_rel->ri_RelationDesc))
+                       continue;
 
-       return result;
+               /* Found it. */
+
+               /*
+                * This is required in order to convert the partition's tuple
+                * to be compatible with the root partitioned table's tuple
+                * descriptor.  When generating the per-subplan result rels,
+                * this was not set.
+                */
+               update_result_rel->ri_PartitionRoot = proute->partition_root;
+
+               /*
+                * Remember the index of this UPDATE result rel in the tuple
+                * routing partition array.
+                */
+               proute->subplan_partition_offsets[i] = proute->num_partitions;
+
+               /*
+                * Also, record in PartitionDispatch that we have a valid
+                * ResultRelInfo for this partition.
+                */
+               Assert(pd->indexes[partidx] == -1);
+               part_result_rel_index = proute->num_partitions++;
+               if (part_result_rel_index >= PARTITION_ROUTING_MAXSIZE)
+                       elog(ERROR, "invalid partition index: %u", 
part_result_rel_index);
+               pd->indexes[partidx] = part_result_rel_index;
+               if (part_result_rel_index >= proute->partitions_allocsize)
+               {
+                       /* Expand allocated place. */
+                       proute->partitions_allocsize =
+                               Min(proute->partitions_allocsize * 2,
+                                       PARTITION_ROUTING_MAXSIZE);
+                       proute->partitions = (ResultRelInfo **)
+                               repalloc(proute->partitions,
+                                                sizeof(ResultRelInfo *) *
+                                                               
proute->partitions_allocsize);
+               }
+               proute->partitions[part_result_rel_index] = update_result_rel;
+               break;
+       }
+
+       return part_result_rel_index;
 }
 
 /*
  * ExecInitPartitionInfo
  *             Initialize ResultRelInfo and other information for a partition
  *
- * Returns the ResultRelInfo
+ * This also stores it in the proute->partitions array at the next
+ * available index, possibly expanding the array if there isn't any space
+ * left in it, and returns the index where it's stored.
  */
-static ResultRelInfo *
+static int
 ExecInitPartitionInfo(ModifyTableState *mtstate,
                                          ResultRelInfo *resultRelInfo,
                                          PartitionTupleRouting *proute,
-                                         EState *estate, int partidx)
+                                         EState *estate,
+                                         PartitionDispatch parent, int partidx)
 {
+       Oid                     partoid = parent->partdesc->oids[partidx];
        ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
        Relation        rootrel = resultRelInfo->ri_RelationDesc,
                                partrel;
@@ -385,12 +422,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
        MemoryContext oldContext;
        AttrNumber *part_attnos = NULL;
        bool            found_whole_row;
+       int                     part_result_rel_index;
 
        /*
         * We locked all the partitions in ExecSetupPartitionTupleRouting
         * including the leaf partitions.
         */
-       partrel = heap_open(proute->partition_oids[partidx], NoLock);
+       partrel = heap_open(partoid, NoLock);
 
        /*
         * Keep ResultRelInfo and other information for this partition in the
@@ -566,8 +604,23 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
                                                                        
&mtstate->ps, RelationGetDescr(partrel));
        }
 
+       part_result_rel_index = proute->num_partitions++;
+       if (part_result_rel_index >= PARTITION_ROUTING_MAXSIZE)
+               elog(ERROR, "invalid partition index: %u", 
part_result_rel_index);
+       parent->indexes[partidx] = part_result_rel_index;
+       if (part_result_rel_index >= proute->partitions_allocsize)
+       {
+               /* Expand allocated place. */
+               proute->partitions_allocsize =
+                       Min(proute->partitions_allocsize * 2, 
PARTITION_ROUTING_MAXSIZE);
+               proute->partitions = (ResultRelInfo **)
+                       repalloc(proute->partitions,
+                                        sizeof(ResultRelInfo *) * 
proute->partitions_allocsize);
+       }
+
        /* Set up information needed for routing tuples to the partition. */
-       ExecInitRoutingInfo(mtstate, estate, proute, leaf_part_rri, partidx);
+       ExecInitRoutingInfo(mtstate, estate, proute, leaf_part_rri,
+                                               part_result_rel_index);
 
        /*
         * If there is an ON CONFLICT clause, initialize state for it.
@@ -626,7 +679,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
                        TupleConversionMap *map;
 
                        map = proute->parent_child_tupconv_maps ?
-                               proute->parent_child_tupconv_maps[partidx] : 
NULL;
+                               
proute->parent_child_tupconv_maps[part_result_rel_index] :
+                               NULL;
 
                        Assert(node->onConflictSet != NIL);
                        Assert(resultRelInfo->ri_onConflict != NULL);
@@ -729,12 +783,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
                }
        }
 
-       Assert(proute->partitions[partidx] == NULL);
-       proute->partitions[partidx] = leaf_part_rri;
+       /* Save here for later use. */
+       proute->partitions[part_result_rel_index] = leaf_part_rri;
 
        MemoryContextSwitchTo(oldContext);
 
-       return leaf_part_rri;
+       return part_result_rel_index;
 }
 
 /*
@@ -766,10 +820,26 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 
        if (map)
        {
+               int             new_size;
+
                /* Allocate parent child map array only if we need to store a 
map */
-               if (!proute->parent_child_tupconv_maps)
+               if (proute->parent_child_tupconv_maps == NULL)
+               {
+                       proute->parent_child_tupconv_maps_allocsize = new_size =
+                               PARTITION_ROUTING_INITSIZE;
                        proute->parent_child_tupconv_maps = (TupleConversionMap 
**)
-                               palloc0(proute->num_partitions * 
sizeof(TupleConversionMap *));
+                               palloc0(sizeof(TupleConversionMap *) * 
new_size);
+               }
+               /* We may have ran out of the initially allocated space. */
+               else if (partidx >= proute->parent_child_tupconv_maps_allocsize)
+               {
+                       proute->parent_child_tupconv_maps_allocsize = new_size =
+                               Min(proute->parent_child_tupconv_maps_allocsize 
* 2,
+                                       PARTITION_ROUTING_MAXSIZE);
+                       proute->parent_child_tupconv_maps = (TupleConversionMap 
**)
+                               repalloc( proute->parent_child_tupconv_maps,
+                                                sizeof(TupleConversionMap *) * 
new_size);
+               }
 
                proute->parent_child_tupconv_maps[partidx] = map;
        }
@@ -788,6 +858,91 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 }
 
 /*
+ * ExecInitPartitionDispatchInfo
+ *             Initialize PartitionDispatch for a partitioned table
+ *
+ * This also stores it in the proute->partition_dispatch_info array at the
+ * specified index ('dispatchidx'), possibly expanding the array if there
+ * isn't space left in it.
+ */
+static PartitionDispatch
+ExecInitPartitionDispatchInfo(PartitionTupleRouting *proute, Oid partoid,
+                                                         PartitionDispatch 
parent_pd, int part_index)
+{
+       Relation        rel;
+       TupleDesc       tupdesc;
+       PartitionDesc partdesc;
+       PartitionKey partkey;
+       PartitionDispatch pd;
+       int                     dispatchidx;
+
+       if (partoid != RelationGetRelid(proute->partition_root))
+               rel = heap_open(partoid, NoLock);
+       else
+               rel = proute->partition_root;
+       tupdesc = RelationGetDescr(rel);
+       partdesc = RelationGetPartitionDesc(rel);
+       partkey = RelationGetPartitionKey(rel);
+
+       pd = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
+       pd->reldesc = rel;
+       pd->key = partkey;
+       pd->keystate = NIL;
+       pd->partdesc = partdesc;
+       if (parent_pd != NULL)
+       {
+               /*
+                * For every partitioned table other than the root, we must 
store a
+                * tuple table slot initialized with its tuple descriptor and a 
tuple
+                * conversion map to convert a tuple from its parent's rowtype 
to its
+                * own. That is to make sure that we are looking at the correct 
row
+                * using the correct tuple descriptor when computing its 
partition key
+                * for tuple routing.
+                */
+               pd->tupslot = MakeSingleTupleTableSlot(tupdesc);
+               pd->tupmap =
+                               
convert_tuples_by_name(RelationGetDescr(parent_pd->reldesc),
+                                                                          
tupdesc,
+                                                                          
gettext_noop("could not convert row type"));
+       }
+       else
+       {
+               /* Not required for the root partitioned table */
+               pd->tupslot = NULL;
+               pd->tupmap = NULL;
+       }
+
+       pd->indexes = (int *) palloc(sizeof(int) * partdesc->nparts);
+
+       /*
+        * Initialize with -1 to signify that the corresponding partition's
+        * ResultRelInfo or PartitionDispatch has not been created yet.
+        */
+       memset(pd->indexes, -1, sizeof(int) * partdesc->nparts);
+
+       dispatchidx = proute->num_dispatch++;
+       if (dispatchidx >= PARTITION_ROUTING_MAXSIZE)
+               elog(ERROR, "invalid partition index: %u", dispatchidx);
+       if (parent_pd)
+               parent_pd->indexes[part_index] = dispatchidx;
+       if (dispatchidx >= proute->dispatch_allocsize)
+       {
+               /* Expand allocated space. */
+               proute->dispatch_allocsize =
+                       Min(proute->dispatch_allocsize * 2, 
PARTITION_ROUTING_MAXSIZE);
+               proute->partition_dispatch_info = (PartitionDispatchData **)
+                       repalloc(proute->partition_dispatch_info,
+                                        sizeof(PartitionDispatchData *) *
+                                        proute->dispatch_allocsize);
+       }
+
+       /* Save here for later use. */
+       proute->partition_dispatch_info[dispatchidx] = pd;
+
+       return pd;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -805,13 +960,14 @@ ExecSetupChildParentMapForLeaf(PartitionTupleRouting 
*proute)
         * These array elements get filled up with maps on an on-demand basis.
         * Initially just set all of them to NULL.
         */
+       proute->child_parent_tupconv_maps_allocsize = 
PARTITION_ROUTING_INITSIZE;
        proute->child_parent_tupconv_maps =
                (TupleConversionMap **) palloc0(sizeof(TupleConversionMap *) *
-                                                                               
proute->num_partitions);
+                                                                               
PARTITION_ROUTING_INITSIZE);
 
        /* Same is the case for this array. All the values are set to false */
        proute->child_parent_map_not_required =
-               (bool *) palloc0(sizeof(bool) * proute->num_partitions);
+               (bool *) palloc0(sizeof(bool) * PARTITION_ROUTING_INITSIZE);
 }
 
 /*
@@ -826,8 +982,9 @@ TupConvMapForLeaf(PartitionTupleRouting *proute,
        TupleConversionMap **map;
        TupleDesc       tupdesc;
 
-       /* Don't call this if we're not supposed to be using this type of map. 
*/
-       Assert(proute->child_parent_tupconv_maps != NULL);
+       /* If nobody else set up the per-leaf maps array, do so ourselves. */
+       if (proute->child_parent_tupconv_maps == NULL)
+               ExecSetupChildParentMapForLeaf(proute);
 
        /* If it's already known that we don't need a map, return NULL. */
        if (proute->child_parent_map_not_required[leaf_index])
@@ -846,6 +1003,30 @@ TupConvMapForLeaf(PartitionTupleRouting *proute,
                                                           gettext_noop("could 
not convert row type"));
 
        /* If it turns out no map is needed, remember for next time. */
+
+       /* We may have run out of the initially allocated space. */
+       if (leaf_index >= proute->child_parent_tupconv_maps_allocsize)
+       {
+               int             new_size,
+                               old_size;
+
+               old_size = proute->child_parent_tupconv_maps_allocsize;
+               proute->child_parent_tupconv_maps_allocsize = new_size =
+                       Min(proute->parent_child_tupconv_maps_allocsize * 2,
+                               PARTITION_ROUTING_MAXSIZE);
+               proute->child_parent_tupconv_maps = (TupleConversionMap **)
+                       repalloc(proute->child_parent_tupconv_maps,
+                                        sizeof(TupleConversionMap *) * 
new_size);
+               memset(proute->child_parent_tupconv_maps + old_size, 0,
+                          sizeof(TupleConversionMap *) * (new_size - 
old_size));
+
+               proute->child_parent_map_not_required = (bool *)
+                       repalloc(proute->child_parent_map_not_required,
+                                        sizeof(bool) * new_size);
+               memset(proute->child_parent_map_not_required + old_size, false,
+                          sizeof(bool) * (new_size - old_size));
+       }
+
        proute->child_parent_map_not_required[leaf_index] = (*map == NULL);
 
        return *map;
@@ -909,9 +1090,9 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
                ExecDropSingleTupleTableSlot(pd->tupslot);
        }
 
-       for (i = 0; i < proute->num_partitions_init; i++)
+       for (i = 0; i < proute->num_partitions; i++)
        {
-               ResultRelInfo *resultRelInfo = proute->partitions_init[i];
+               ResultRelInfo *resultRelInfo = proute->partitions[i];
 
                /* Allow any FDWs to shut down if they've been exercised */
                if (resultRelInfo->ri_PartitionReadyForRouting &&
@@ -920,6 +1101,28 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
                        
resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
                                                                                
                                   resultRelInfo);
 
+               /*
+                * Check if this result rel is one of UPDATE subplan result 
rels,
+                * which if so, let ExecEndPlan() close it.
+                */
+               if (proute->subplan_partition_offsets)
+               {
+                       int             j;
+                       int             found = false;
+
+                       for (j = 0; j < proute->num_subplan_partition_offsets; 
j++)
+                       {
+                               if (proute->subplan_partition_offsets[j] == i)
+                               {
+                                       found = true;
+                                       break;
+                               }
+                       }
+
+                       if (found)
+                               continue;
+               }
+
                ExecCloseIndices(resultRelInfo);
                heap_close(resultRelInfo->ri_RelationDesc, NoLock);
        }
@@ -931,211 +1134,6 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
                ExecDropSingleTupleTableSlot(proute->partition_tuple_slot);
 }
 
-/*
- * RelationGetPartitionDispatchInfo
- *             Returns an array of PartitionDispatch as is required for routing
- *             tuples to the correct partition.
- *
- * 'num_parted' is set to the size of the returned array and the
- *'leaf_part_oids' array is allocated and populated with each leaf partition
- * Oid in the hierarchy. 'n_leaf_part_oids' is set to the size of that array.
- * All the relations in the partition tree (including 'rel') must have been
- * locked (using at least the AccessShareLock) by the caller.
- */
-static PartitionDispatch *
-RelationGetPartitionDispatchInfo(Relation rel,
-                                                                int 
*num_parted, Oid **leaf_part_oids,
-                                                                int 
*n_leaf_part_oids)
-{
-       List       *pdlist = NIL;
-       PartitionDispatchData **pd;
-       ListCell   *lc;
-       int                     i;
-       int                     leaf_part_oid_size;
-
-       Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-
-       *num_parted = 0;
-       *n_leaf_part_oids = 0;
-
-       leaf_part_oid_size = 0;
-       *leaf_part_oids = NULL;
-
-       get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids,
-                                                                  
n_leaf_part_oids, &leaf_part_oid_size);
-       *num_parted = list_length(pdlist);
-       pd = (PartitionDispatchData **) palloc(*num_parted *
-                                                                               
   sizeof(PartitionDispatchData *));
-       i = 0;
-       foreach(lc, pdlist)
-       {
-               pd[i++] = lfirst(lc);
-       }
-
-       return pd;
-}
-
-/*
- * get_partition_dispatch_recurse
- *             Recursively expand partition tree rooted at rel
- *
- * As the partition tree is expanded in a depth-first manner, we populate
- * '*pds' with PartitionDispatch objects of each partitioned table we find,
- * and populate leaf_part_oids with each leaf partition OID found.
- *
- * Note that the order of OIDs of leaf partitions in leaf_part_oids matches
- * the order in which the planner's expand_partitioned_rtentry() processes
- * them.  It's not necessarily the case that the offsets match up exactly,
- * because constraint exclusion might prune away some partitions on the
- * planner side, whereas we'll always have the complete list; but unpruned
- * partitions will appear in the same order in the plan as they are returned
- * here.
- *
- * Note: Callers must not attempt to pfree the 'leaf_part_oids' array.
- */
-static void
-get_partition_dispatch_recurse(Relation rel, Relation parent,
-                                                          List **pds, Oid 
**leaf_part_oids,
-                                                          int 
*n_leaf_part_oids,
-                                                          int 
*leaf_part_oid_size)
-{
-       TupleDesc       tupdesc = RelationGetDescr(rel);
-       PartitionDesc partdesc = RelationGetPartitionDesc(rel);
-       PartitionKey partkey = RelationGetPartitionKey(rel);
-       PartitionDispatch pd;
-       int                     i;
-       int                     nparts;
-       int                     oid_array_used;
-       int                     oid_array_size;
-       Oid                *oid_array;
-       Oid                *partdesc_oids;
-       bool       *partdesc_subpartitions;
-       int                *indexes;
-
-       check_stack_depth();
-
-       /* Build a PartitionDispatch for this table and add it to *pds. */
-       pd = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
-       *pds = lappend(*pds, pd);
-       pd->reldesc = rel;
-       pd->key = partkey;
-       pd->keystate = NIL;
-       pd->partdesc = partdesc;
-       if (parent != NULL)
-       {
-               /*
-                * For every partitioned table other than the root, we must 
store a
-                * tuple table slot initialized with its tuple descriptor and a 
tuple
-                * conversion map to convert a tuple from its parent's rowtype 
to its
-                * own. That is to make sure that we are looking at the correct 
row
-                * using the correct tuple descriptor when computing its 
partition key
-                * for tuple routing.
-                */
-               pd->tupslot = MakeSingleTupleTableSlot(tupdesc);
-               pd->tupmap = convert_tuples_by_name(RelationGetDescr(parent),
-                                                                               
        tupdesc,
-                                                                               
        gettext_noop("could not convert row type"));
-       }
-       else
-       {
-               /* Not required for the root partitioned table */
-               pd->tupslot = NULL;
-               pd->tupmap = NULL;
-
-               /*
-                * If the parent has no sub partitions then we can skip 
calculating
-                * all the leaf partitions and just return all the oids at this 
level.
-                * In this case, the indexes were also pre-calculated for us by 
the
-                * syscache code.
-                */
-               if (!partdesc->hassubpart)
-               {
-                       *leaf_part_oids = partdesc->oids;
-                       /* XXX or should we memcpy this out of syscache? */
-                       pd->indexes = partdesc->indexes;
-                       *n_leaf_part_oids = partdesc->nparts;
-                       return;
-               }
-       }
-
-       /*
-        * Go look at each partition of this table.  If it's a leaf partition,
-        * simply add its OID to *leaf_part_oids.  If it's a partitioned table,
-        * recursively call get_partition_dispatch_recurse(), so that its
-        * partitions are processed as well and a corresponding 
PartitionDispatch
-        * object gets added to *pds.
-        *
-        * The 'indexes' array is used when searching for a partition matching a
-        * given tuple.  The actual value we store here depends on whether the
-        * array element belongs to a leaf partition or a subpartitioned table.
-        * For leaf partitions we store the index into *leaf_part_oids, and for
-        * sub-partitioned tables we store a negative version of the index into
-        * the *pds list.  Both indexes are 0-based, but the first element of 
the
-        * *pds list is the root partition, so 0 always means the first leaf. 
When
-        * searching, if we see a negative value, the search must continue in 
the
-        * corresponding sub-partition; otherwise, we've identified the correct
-        * partition.
-        */
-       oid_array_used = *n_leaf_part_oids;
-       oid_array_size = *leaf_part_oid_size;
-       oid_array = *leaf_part_oids;
-       nparts = partdesc->nparts;
-
-       if (!oid_array)
-       {
-               oid_array_size = *leaf_part_oid_size = nparts;
-               *leaf_part_oids = (Oid *) palloc(sizeof(Oid) * nparts);
-               oid_array = *leaf_part_oids;
-       }
-
-       partdesc_oids = partdesc->oids;
-       partdesc_subpartitions = partdesc->subpartitions;
-
-       pd->indexes = indexes = (int *) palloc(nparts * sizeof(int));
-
-       for (i = 0; i < nparts; i++)
-       {
-               Oid                     partrelid = partdesc_oids[i];
-
-               if (!partdesc_subpartitions[i])
-               {
-                       if (oid_array_size <= oid_array_used)
-                       {
-                               oid_array_size *= 2;
-                               oid_array = (Oid *) repalloc(oid_array,
-                                                                               
         sizeof(Oid) * oid_array_size);
-                       }
-
-                       oid_array[oid_array_used] = partrelid;
-                       indexes[i] = oid_array_used++;
-               }
-               else
-               {
-                       /*
-                        * We assume all tables in the partition tree were 
already locked
-                        * by the caller.
-                        */
-                       Relation        partrel = heap_open(partrelid, NoLock);
-
-                       *n_leaf_part_oids = oid_array_used;
-                       *leaf_part_oid_size = oid_array_size;
-                       *leaf_part_oids = oid_array;
-
-                       indexes[i] = -list_length(*pds);
-                       get_partition_dispatch_recurse(partrel, rel, pds, 
leaf_part_oids,
-                                                                               
   n_leaf_part_oids, leaf_part_oid_size);
-
-                       oid_array_used = *n_leaf_part_oids;
-                       oid_array_size = *leaf_part_oid_size;
-                       oid_array = *leaf_part_oids;
-               }
-       }
-
-       *n_leaf_part_oids = oid_array_used;
-       *leaf_part_oid_size = oid_array_size;
-       *leaf_part_oids = oid_array;
-}
-
 /* ----------------
  *             FormPartitionKeyDatum
  *                     Construct values[] and isnull[] arrays for the 
partition key
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index 07b5f968aa..8b671c6426 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -68,7 +68,6 @@ static TupleTableSlot 
*ExecPrepareTupleRouting(ModifyTableState *mtstate,
                                                ResultRelInfo *targetRelInfo,
                                                TupleTableSlot *slot);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
                                                int whichplan);
@@ -1666,7 +1665,7 @@ ExecSetupTransitionCaptureState(ModifyTableState 
*mtstate, EState *estate)
        if (mtstate->mt_transition_capture != NULL ||
                mtstate->mt_oc_transition_capture != NULL)
        {
-               ExecSetupChildParentMapForTcs(mtstate);
+               ExecSetupChildParentMapForSubplan(mtstate);
 
                /*
                 * Install the conversion map for the first plan for UPDATE and 
DELETE
@@ -1709,15 +1708,12 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
         * value is to be used as an index into the arrays for the ResultRelInfo
         * and TupleConversionMap for the partition.
         */
-       partidx = ExecFindPartition(targetRelInfo,
-                                                               
proute->partition_dispatch_info,
-                                                               slot,
-                                                               estate);
+       partidx = ExecFindPartition(mtstate, targetRelInfo, proute, slot, 
estate);
        Assert(partidx >= 0 && partidx < proute->num_partitions);
 
        /* Get the ResultRelInfo corresponding to the selected partition. */
-       partrel = ExecGetPartitionInfo(mtstate, targetRelInfo, proute, estate,
-                                                                  partidx);
+       Assert(proute->partitions[partidx] != NULL);
+       partrel = proute->partitions[partidx];
 
        /*
         * Check whether the partition is routable if we didn't yet
@@ -1825,17 +1821,6 @@ ExecSetupChildParentMapForSubplan(ModifyTableState 
*mtstate)
        int                     i;
 
        /*
-        * First check if there is already a per-subplan array allocated. Even 
if
-        * there is already a per-leaf map array, we won't require a per-subplan
-        * one, since we will use the subplan offset array to convert the 
subplan
-        * index to per-leaf index.
-        */
-       if (mtstate->mt_per_subplan_tupconv_maps ||
-               (mtstate->mt_partition_tuple_routing &&
-                
mtstate->mt_partition_tuple_routing->child_parent_tupconv_maps))
-               return;
-
-       /*
         * Build array of conversion maps from each child's TupleDesc to the one
         * used in the target relation.  The map pointers may be NULL when no
         * conversion is necessary, which is hopefully a common case.
@@ -1857,78 +1842,17 @@ ExecSetupChildParentMapForSubplan(ModifyTableState 
*mtstate)
 }
 
 /*
- * Initialize the child-to-root tuple conversion map array required for
- * capturing transition tuples.
- *
- * The map array can be indexed either by subplan index or by leaf-partition
- * index.  For transition tables, we need a subplan-indexed access to the map,
- * and where tuple-routing is present, we also require a leaf-indexed access.
- */
-static void
-ExecSetupChildParentMapForTcs(ModifyTableState *mtstate)
-{
-       PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-
-       /*
-        * If partition tuple routing is set up, we will require 
partition-indexed
-        * access. In that case, create the map array indexed by partition; we
-        * will still be able to access the maps using a subplan index by
-        * converting the subplan index to a partition index using
-        * subplan_partition_offsets. If tuple routing is not set up, it means 
we
-        * don't require partition-indexed access. In that case, create just a
-        * subplan-indexed map.
-        */
-       if (proute)
-       {
-               /*
-                * If a partition-indexed map array is to be created, the 
subplan map
-                * array has to be NULL.  If the subplan map array is already 
created,
-                * we won't be able to access the map using a partition index.
-                */
-               Assert(mtstate->mt_per_subplan_tupconv_maps == NULL);
-
-               ExecSetupChildParentMapForLeaf(proute);
-       }
-       else
-               ExecSetupChildParentMapForSubplan(mtstate);
-}
-
-/*
  * For a given subplan index, get the tuple conversion map.
  */
 static TupleConversionMap *
 tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
 {
-       /*
-        * If a partition-index tuple conversion map array is allocated, we need
-        * to first get the index into the partition array. Exactly *one* of the
-        * two arrays is allocated. This is because if there is a partition 
array
-        * required, we don't require subplan-indexed array since we can 
translate
-        * subplan index into partition index. And, we create a subplan-indexed
-        * array *only* if partition-indexed array is not required.
-        */
+       /* If nobody else set the per-subplan array of maps, do so ouselves. */
        if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-       {
-               int                     leaf_index;
-               PartitionTupleRouting *proute = 
mtstate->mt_partition_tuple_routing;
+               ExecSetupChildParentMapForSubplan(mtstate);
 
-               /*
-                * If subplan-indexed array is NULL, things should have been 
arranged
-                * to convert the subplan index to partition index.
-                */
-               Assert(proute && proute->subplan_partition_offsets != NULL &&
-                          whichplan < proute->num_subplan_partition_offsets);
-
-               leaf_index = proute->subplan_partition_offsets[whichplan];
-
-               return TupConvMapForLeaf(proute, 
getTargetResultRelInfo(mtstate),
-                                                                leaf_index);
-       }
-       else
-       {
-               Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-               return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-       }
+       Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
+       return mtstate->mt_per_subplan_tupconv_maps[whichplan];
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/utils/cache/partcache.c 
b/src/backend/utils/cache/partcache.c
index b36b7366e5..aa82aa52eb 100644
--- a/src/backend/utils/cache/partcache.c
+++ b/src/backend/utils/cache/partcache.c
@@ -594,7 +594,7 @@ RelationBuildPartitionDesc(Relation rel)
                int                     next_index = 0;
 
                result->oids = (Oid *) palloc0(nparts * sizeof(Oid));
-               result->subpartitions = (bool *) palloc(nparts * sizeof(bool));
+               result->is_leaf = (bool *) palloc(nparts * sizeof(bool));
 
                boundinfo = (PartitionBoundInfoData *)
                        palloc0(sizeof(PartitionBoundInfoData));
@@ -775,7 +775,6 @@ RelationBuildPartitionDesc(Relation rel)
                }
 
                result->boundinfo = boundinfo;
-               result->hassubpart = false; /* unless we discover otherwise 
below */
 
                /*
                 * Now assign OIDs from the original array into mapped indexes 
of the
@@ -786,33 +785,13 @@ RelationBuildPartitionDesc(Relation rel)
                for (i = 0; i < nparts; i++)
                {
                        int                     index = mapping[i];
-                       bool            subpart;
 
                        result->oids[index] = oids[i];
-
-                       subpart = (get_rel_relkind(oids[i]) == 
RELKIND_PARTITIONED_TABLE);
                        /* Record if the partition is a subpartitioned table */
-                       result->subpartitions[index] = subpart;
-                       result->hassubpart |= subpart;
+                       result->is_leaf[index] =
+                               (get_rel_relkind(oids[i]) != 
RELKIND_PARTITIONED_TABLE);
                }
 
-               /*
-                * If there are no subpartitions then we can pre-calculate the
-                * PartitionDispatch->indexes array.  Doing this here saves 
quite a
-                * bit of overhead on simple queries which perform INSERTs or 
UPDATEs
-                * on partitioned tables with many partitions.  The 
pre-calculation is
-                * very simple.  All we need to store is a sequence of numbers 
from 0
-                * to nparts - 1.
-                */
-               if (!result->hassubpart)
-               {
-                       result->indexes = (int *) palloc(nparts * sizeof(int));
-                       for (i = 0; i < nparts; i++)
-                               result->indexes[i] = i;
-               }
-               else
-                       result->indexes = NULL;
-
                pfree(mapping);
        }
 
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index a8c69ff224..8d20469c98 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -26,18 +26,11 @@
 typedef struct PartitionDescData
 {
        int                     nparts;                 /* Number of partitions 
*/
-       Oid                *oids;                       /* OIDs array of 
'nparts' of partitions in
-                                                                * partbound 
order */
-       int                *indexes;            /* Stores index for 
corresponding 'oids'
-                                                                * element for 
use in tuple routing, or NULL
-                                                                * if 
hassubpart is true.
-                                                                */
-       bool       *subpartitions;      /* Array of 'nparts' set to true if the
-                                                                * 
corresponding 'oids' element belongs to a
-                                                                * 
sub-partitioned table.
-                                                                */
-       bool            hassubpart;             /* true if any oid belongs to a
-                                                                * 
sub-partitioned table */
+       Oid                *oids;                       /* Array of length 
'nparts' containing
+                                                                * partition 
OIDs in order of the their
+                                                                * bounds */
+       bool       *is_leaf;            /* Array of length 'nparts' containing 
whether
+                                                                * a partition 
is a leaf partition */
        PartitionBoundInfo boundinfo;   /* collection of partition bounds */
 } PartitionDescData;
 
diff --git a/src/include/executor/execPartition.h 
b/src/include/executor/execPartition.h
index 822f66f5e2..91b840e12f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -50,72 +50,124 @@ typedef struct PartitionDispatchData
 typedef struct PartitionDispatchData *PartitionDispatch;
 
 /*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to execute
- * tuple-routing between partitions.
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions
  *
- * partition_dispatch_info             Array of PartitionDispatch objects with 
one
- *                                                             entry for every 
partitioned table in the
- *                                                             partition tree.
- * num_dispatch                                        number of partitioned 
tables in the partition
- *                                                             tree (= length 
of partition_dispatch_info[])
- * partition_oids                              Array of leaf partitions OIDs 
with one entry
- *                                                             for every leaf 
partition in the partition tree,
- *                                                             initialized in 
full by
- *                                                             
ExecSetupPartitionTupleRouting.
- * partitions                                  Array of ResultRelInfo* objects 
with one entry
- *                                                             for every leaf 
partition in the partition tree,
- *                                                             initialized 
lazily by ExecInitPartitionInfo.
- * partitions_init                             Array of ResultRelInfo* objects 
in the order
- *                                                             that they were 
lazily initialized.
- * num_partitions                              Number of leaf partitions in 
the partition tree
- *                                                             (= 
'partitions_oid'/'partitions' array length)
- * num_partitions_init                 Number of leaf partition lazily setup 
so far.
- * partitions_init_size                        Size of partitions_init array.
- * parent_child_tupconv_maps   Array of TupleConversionMap objects with one
- *                                                             entry for every 
leaf partition (required to
- *                                                             convert tuple 
from the root table's rowtype to
- *                                                             a leaf 
partition's rowtype after tuple routing
- *                                                             is done). 
Remains NULL if no maps to store.
- * child_parent_tupconv_maps   Array of TupleConversionMap objects with one
- *                                                             entry for every 
leaf partition (required to
- *                                                             convert an 
updated tuple from the leaf
- *                                                             partition's 
rowtype to the root table's rowtype
- *                                                             so that tuple 
routing can be done)
- * child_parent_map_not_required  Array of bool. True value means that a map is
- *                                                             determined to 
be not required for the given
- *                                                             partition. 
False means either we haven't yet
- *                                                             checked if a 
map is required, or it was
- *                                                             determined to 
be required.
- * subplan_partition_offsets   Integer array ordered by UPDATE subplans. Each
- *                                                             element of this 
array has the index into the
- *                                                             corresponding 
partition in partitions array.
- * num_subplan_partition_offsets  Length of 'subplan_partition_offsets' array
- * partition_tuple_slot                        TupleTableSlot to be used to 
manipulate any
- *                                                             given leaf 
partition's rowtype after that
- *                                                             partition is 
chosen for insertion by
- *                                                             tuple-routing.
- * root_tuple_slot                             TupleTableSlot to be used to 
transiently hold
- *                                                             copy of a tuple 
that's being moved across
- *                                                             partitions in 
the root partitioned table's
- *                                                             rowtype
+ *     partition_root                  Root table, that is, the table 
mentioned in the
+ *                                                     INSERT or UPDATE query 
or COPY FROM command.
+ *
+ *     partition_dispatch_info Contains PartitionDispatch objects for every
+ *                                                     partitioned table 
touched by tuple routing.  The
+ *                                                     entry for the root 
partitioned table is *always*
+ *                                                     present as the first 
entry of this array.
+ *
+ *     num_dispatch                    The number of existing entries and also 
serves as
+ *                                                     the index of the next 
entry to be allocated and
+ *                                                     placed in 
'partition_dispatch_info'.
+ *
+ *     dispatch_allocsize              (>= 'num_dispatch') is the number of 
entries that
+ *                                                     can be stored in 
'partition_dispatch_info' before
+ *                                                     needing to reallocate 
more space.
+ *
+ *     partitions                              Contains pointers to a 
ResultRelInfos of all leaf
+ *                                                     partitions touched by 
tuple routing.  Some of
+ *                                                     these are pointers to 
"reused" ResultRelInfos,
+ *                                                     that is, those that are 
created and destroyed
+ *                                                     outside 
execPartition.c, for example, when tuple
+ *                                                     routing is used for 
UPDATE queries that modify
+ *                                                     the partition key.  
Rest of them are pointers to
+ *                                                     ResultRelInfos managed 
by execPartition.c itself
+ *
+ *     num_partitions                  The number of existing entries and also 
serves as
+ *                                                     the index of the next 
entry to be allocated and
+ *                                                     placed in 'partitions'
+ *
+ *     partitions_allocsize    (>= 'num_partitions') is the number of entries
+ *                                                     that can be stored in 
'partitions' before needing
+ *                                                     to reallocate more space
+ *
+ *     parent_child_tupconv_maps       Contains information to convert tuples 
of the
+ *                                                     root parent's rowtype 
to those of the leaf
+ *                                                     partitions' rowtype, 
but only for those partitions
+ *                                                     whose TupleDescs are 
physically different from the
+ *                                                     root parent's.  If none 
of the partitions has such
+ *                                                     a differing TupleDesc, 
then it's NULL.  If
+ *                                                     non-NULL, is of the 
same size as 'partitions', to
+ *                                                     be able to use the same 
array index.  Also, there
+ *                                                     need not be more of 
these maps than there are
+ *                                                     partitions that were 
touched.
+ *
+ *     parent_child_tupconv_maps_allocsize             The number of entries 
that can be
+ *                                                     stored in 
'parent_child_tupconv_maps' before
+ *                                                     needing to reallocate 
more space
+ *
+ *     partition_tuple_slot    This is a tuple slot used to store a tuple using
+ *                                                     rowtype of the the 
partition chosen by tuple
+ *                                                     routing.  Maintained 
separately because partitions
+ *                                                     may have different 
rowtype.
+ *
+ * Note: The following fields are used only when UPDATE ends up needing to
+ * do tuple routing.
+ *
+ *     child_parent_tupconv_maps       Information to convert tuples of the 
leaf
+ *                                                     partitions' rowtype to 
the the root parent's
+ *                                                     rowtype.  These are 
needed by transition table
+ *                                                     machinery when storing 
tuples of partition's
+ *                                                     rowtype into the 
transition table that can only
+ *                                                     store tuples of the 
root parent's rowtype.
+ *                                                     Like 
'parent_child_tupconv_maps' it remains NULL
+ *                                                     if none of the 
partitions selected by tuple
+ *                                                     routing needed a 
conversion map.  Also, if non-
+ *                                                     NULL, is of the same 
size as 'partitions'.
+ *
+ *     child_parent_map_not_required   Stores if we don't need a conversion
+ *                                                     map for a partition so 
that TupConvMapForLeaf
+ *                                                     can return quickly if 
set
+ *
+ *     child_parent_tupconv_maps_allocsize             The number of entries 
that can be
+ *                                                     stored in 
'child_parent_tupconv_maps' before
+ *                                                     needing to reallocate 
more space
+ *
+ *     subplan_partition_offsets       The following maps indexes of UPDATE 
result
+ *                                                     rels in the per-subplan 
array to indexes of their
+ *                                                     pointers in the 
'partitions'
+ *
+ *     num_subplan_partition_offsets   The number of entries in
+ *                                                     
'subplan_partition_offsets', which is same as the
+ *                                                     number of UPDATE result 
rels
+ *
+ *     root_tuple_slot                 During UPDATE tuple routing, this tuple 
slot is
+ *                                                     used to transiently 
store a tuple using the root
+ *                                                     table's rowtype after 
converting it from the
+ *                                                     tuple's source leaf 
partition's rowtype.  That is,
+ *                                                     if leaf partition's 
rowtype is different.
  *-----------------------
  */
 typedef struct PartitionTupleRouting
 {
+       Relation        partition_root;
+
        PartitionDispatch *partition_dispatch_info;
        int                     num_dispatch;
-       Oid                *partition_oids;
+       int                     dispatch_allocsize;
+
        ResultRelInfo **partitions;
-       ResultRelInfo **partitions_init;
        int                     num_partitions;
-       int                     num_partitions_init;
-       int                     partitions_init_size;
+       int                     partitions_allocsize;
+
        TupleConversionMap **parent_child_tupconv_maps;
+       int                     parent_child_tupconv_maps_allocsize;
+
+       TupleTableSlot *partition_tuple_slot;
+
        TupleConversionMap **child_parent_tupconv_maps;
        bool       *child_parent_map_not_required;
+       int                     child_parent_tupconv_maps_allocsize;
+
        int                *subplan_partition_offsets;
        int                     num_subplan_partition_offsets;
-       TupleTableSlot *partition_tuple_slot;
+
        TupleTableSlot *root_tuple_slot;
 } PartitionTupleRouting;
 
@@ -193,8 +245,9 @@ typedef struct PartitionPruneState
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState 
*mtstate,
                                                           Relation rel);
-extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
-                                 PartitionDispatch *pd,
+extern int ExecFindPartition(ModifyTableState *mtstate,
+                                 ResultRelInfo *resultRelInfo,
+                                 PartitionTupleRouting *proute,
                                  TupleTableSlot *slot,
                                  EState *estate);
 extern ResultRelInfo *ExecGetPartitionInfo(ModifyTableState *mtstate,

Reply via email to