On 2017/09/05 14:11, Amit Khandekar wrote:
> Great, thanks. Just wanted to make sure someone is working on that,
> because, as you said, it is no longer an EIBO patch. Since you are
> doing that, I won't work on that.

Here is that patch (actually two patches).  Sorry it took me a bit.

Description:

[PATCH 1/2] Decouple RelationGetPartitionDispatchInfo() from executor

Currently it and the structure it generates viz. PartitionDispatch
objects are too coupled with the executor's tuple-routing code.  In
particular, it's pretty undesirable that it makes it the responsibility
of the caller to release some resources, such as executor tuple table
slots, tuple-conversion maps, etc.  After this refactoring,
ExecSetupPartitionTupleRouting() now needs to
do some of the work that was previously done in
RelationGetPartitionDispatchInfo().

[PATCH 2/2] Make RelationGetPartitionDispatch expansion order
 depth-first

This is so as it matches what the planner is doing with partitioning
inheritance expansion.  Matching with planner order helps because
it helps ease matching the executor's per-partition objects with
planner-created per-partition nodes.


Actually, I'm coming to a conclusion that we should keep any
whole-partition-tree stuff out of partition.c and its interface, as Robert
has also alluded to in an earlier message on this thread [1].  But since
that's a different topic, I'll shut up about it on this thread and start a
new thread to discuss what kind of code rearrangement is possible.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoafr%3DhUrM%3Dcbx-k%3DBDHOF2OfXaw95HQSNAK4mHBwmSjtw%40mail.gmail.com
From 6956ac321df169f6c26c383ddcb5ea48c1a0071b Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Wed, 30 Aug 2017 10:02:05 +0900
Subject: [PATCH 1/3] Decouple RelationGetPartitionDispatchInfo() from executor

Currently it and the structure it generates viz. PartitionDispatch
objects are too coupled with the executor's tuple-routing code.  In
particular, it's pretty undesirable that it makes it the responsibility
of the caller to release some resources, such as executor tuple table
slots, tuple-conversion maps, etc.  That makes it harder to use in
places other than where it's currently being used.

After this refactoring, ExecSetupPartitionTupleRouting() now needs to
do some of the work that was previously done in
RelationGetPartitionDispatchInfo().
---
 src/backend/catalog/partition.c        | 53 +++++++-------------
 src/backend/commands/copy.c            | 32 +++++++------
 src/backend/executor/execMain.c        | 88 ++++++++++++++++++++++++++++++----
 src/backend/executor/nodeModifyTable.c | 37 +++++++-------
 src/include/catalog/partition.h        | 20 +++-----
 src/include/executor/executor.h        |  4 +-
 src/include/nodes/execnodes.h          | 40 +++++++++++++++-
 7 files changed, 181 insertions(+), 93 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index c6bd02f77d..4f594243d3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1061,7 +1061,6 @@ RelationGetPartitionDispatchInfo(Relation rel,
                Relation        partrel = lfirst(lc1);
                Relation        parent = lfirst(lc2);
                PartitionKey partkey = RelationGetPartitionKey(partrel);
-               TupleDesc       tupdesc = RelationGetDescr(partrel);
                PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
                int                     j,
                                        m;
@@ -1069,29 +1068,12 @@ RelationGetPartitionDispatchInfo(Relation rel,
                pd[i] = (PartitionDispatch) 
palloc(sizeof(PartitionDispatchData));
                pd[i]->reldesc = partrel;
                pd[i]->key = partkey;
-               pd[i]->keystate = NIL;
                pd[i]->partdesc = partdesc;
                if (parent != NULL)
-               {
-                       /*
-                        * For every partitioned table other than root, we must 
store a
-                        * tuple table slot initialized with its tuple 
descriptor and a
-                        * tuple conversion map to convert a tuple from its 
parent's
-                        * rowtype to its own. That is to make sure that we are 
looking at
-                        * the correct row using the correct tuple descriptor 
when
-                        * computing its partition key for tuple routing.
-                        */
-                       pd[i]->tupslot = MakeSingleTupleTableSlot(tupdesc);
-                       pd[i]->tupmap = 
convert_tuples_by_name(RelationGetDescr(parent),
-                                                                               
                   tupdesc,
-                                                                               
                   gettext_noop("could not convert row type"));
-               }
+                       pd[i]->parentoid = RelationGetRelid(parent);
                else
-               {
-                       /* Not required for the root partitioned table */
-                       pd[i]->tupslot = NULL;
-                       pd[i]->tupmap = NULL;
-               }
+                       pd[i]->parentoid = InvalidOid;
+
                pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
 
                /*
@@ -1840,7 +1822,7 @@ generate_partition_qual(Relation rel)
  *                     Construct values[] and isnull[] arrays for the 
partition key
  *                     of a tuple.
  *
- *     pd                              Partition dispatch object of the 
partitioned table
+ *     ptrinfo                 PartitionTupleRoutingInfo object of the table
  *     slot                    Heap tuple from which to extract partition key
  *     estate                  executor state for evaluating any partition key
  *                                     expressions (must be non-NULL)
@@ -1852,26 +1834,27 @@ generate_partition_qual(Relation rel)
  * ----------------
  */
 void
-FormPartitionKeyDatum(PartitionDispatch pd,
+FormPartitionKeyDatum(PartitionTupleRoutingInfo *ptrinfo,
                                          TupleTableSlot *slot,
                                          EState *estate,
                                          Datum *values,
                                          bool *isnull)
 {
+       PartitionDispatch       pd = ptrinfo->pd;
        ListCell   *partexpr_item;
        int                     i;
 
-       if (pd->key->partexprs != NIL && pd->keystate == NIL)
+       if (pd->key->partexprs != NIL && ptrinfo->keystate == NIL)
        {
                /* Check caller has set up context correctly */
                Assert(estate != NULL &&
                           GetPerTupleExprContext(estate)->ecxt_scantuple == 
slot);
 
                /* First time through, set up expression evaluation state */
-               pd->keystate = ExecPrepareExprList(pd->key->partexprs, estate);
+               ptrinfo->keystate = ExecPrepareExprList(pd->key->partexprs, 
estate);
        }
 
-       partexpr_item = list_head(pd->keystate);
+       partexpr_item = list_head(ptrinfo->keystate);
        for (i = 0; i < pd->key->partnatts; i++)
        {
                AttrNumber      keycol = pd->key->partattrs[i];
@@ -1911,13 +1894,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * the latter case.
  */
 int
-get_partition_for_tuple(PartitionDispatch *pd,
+get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
                                                TupleTableSlot *slot,
                                                EState *estate,
-                                               PartitionDispatchData 
**failed_at,
+                                               PartitionTupleRoutingInfo 
**failed_at,
                                                TupleTableSlot **failed_slot)
 {
-       PartitionDispatch parent;
+       PartitionTupleRoutingInfo *parent;
        Datum           values[PARTITION_MAX_KEYS];
        bool            isnull[PARTITION_MAX_KEYS];
        int                     result;
@@ -1925,11 +1908,11 @@ get_partition_for_tuple(PartitionDispatch *pd,
        TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
        /* start with the root partitioned table */
-       parent = pd[0];
+       parent = ptrinfos[0];
        while (true)
        {
-               PartitionKey key = parent->key;
-               PartitionDesc partdesc = parent->partdesc;
+               PartitionKey  key = parent->pd->key;
+               PartitionDesc partdesc = parent->pd->partdesc;
                TupleTableSlot *myslot = parent->tupslot;
                TupleConversionMap *map = parent->tupmap;
                int             cur_index = -1;
@@ -2039,13 +2022,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
                        *failed_slot = slot;
                        break;
                }
-               else if (parent->indexes[cur_index] >= 0)
+               else if (parent->pd->indexes[cur_index] >= 0)
                {
-                       result = parent->indexes[cur_index];
+                       result = parent->pd->indexes[cur_index];
                        break;
                }
                else
-                       parent = pd[-parent->indexes[cur_index]];
+                       parent = ptrinfos[-parent->pd->indexes[cur_index]];
        }
 
 error_exit:
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfa3f059c2..288d6a1ab2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -165,8 +165,8 @@ typedef struct CopyStateData
        bool            volatile_defexprs;      /* is any of defexprs volatile? 
*/
        List       *range_table;
 
-       PartitionDispatch *partition_dispatch_info;
-       int                     num_dispatch;   /* Number of entries in the 
above array */
+       PartitionTupleRoutingInfo **ptrinfos;
+       int                     num_parted;             /* Number of entries in 
the above array */
        int                     num_partitions; /* Number of members in the 
following arrays */
        ResultRelInfo *partitions;      /* Per partition result relation */
        TupleConversionMap **partition_tupconv_maps;
@@ -2445,7 +2445,7 @@ CopyFrom(CopyState cstate)
         */
        if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
        {
-               PartitionDispatch *partition_dispatch_info;
+               PartitionTupleRoutingInfo **ptrinfos;
                ResultRelInfo *partitions;
                TupleConversionMap **partition_tupconv_maps;
                TupleTableSlot *partition_tuple_slot;
@@ -2455,13 +2455,13 @@ CopyFrom(CopyState cstate)
                ExecSetupPartitionTupleRouting(cstate->rel,
                                                                           1,
                                                                           
estate,
-                                                                          
&partition_dispatch_info,
+                                                                          
&ptrinfos,
                                                                           
&partitions,
                                                                           
&partition_tupconv_maps,
                                                                           
&partition_tuple_slot,
                                                                           
&num_parted, &num_partitions);
-               cstate->partition_dispatch_info = partition_dispatch_info;
-               cstate->num_dispatch = num_parted;
+               cstate->ptrinfos = ptrinfos;
+               cstate->num_parted = num_parted;
                cstate->partitions = partitions;
                cstate->num_partitions = num_partitions;
                cstate->partition_tupconv_maps = partition_tupconv_maps;
@@ -2502,7 +2502,7 @@ CopyFrom(CopyState cstate)
        if ((resultRelInfo->ri_TrigDesc != NULL &&
                 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
                  resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) ||
-               cstate->partition_dispatch_info != NULL ||
+               cstate->ptrinfos != NULL ||
                cstate->volatile_defexprs)
        {
                useHeapMultiInsert = false;
@@ -2580,7 +2580,7 @@ CopyFrom(CopyState cstate)
                ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 
                /* Determine the partition to heap_insert the tuple into */
-               if (cstate->partition_dispatch_info)
+               if (cstate->ptrinfos)
                {
                        int                     leaf_part_index;
                        TupleConversionMap *map;
@@ -2594,7 +2594,7 @@ CopyFrom(CopyState cstate)
                         * partition, respectively.
                         */
                        leaf_part_index = ExecFindPartition(resultRelInfo,
-                                                                               
                cstate->partition_dispatch_info,
+                                                                               
                cstate->ptrinfos,
                                                                                
                slot,
                                                                                
                estate);
                        Assert(leaf_part_index >= 0 &&
@@ -2826,8 +2826,8 @@ CopyFrom(CopyState cstate)
 
        ExecCloseIndices(resultRelInfo);
 
-       /* Close all the partitioned tables, leaf partitions, and their indices 
*/
-       if (cstate->partition_dispatch_info)
+       /* Release some resources that we acquired for tuple-routing. */
+       if (cstate->ptrinfos)
        {
                int                     i;
 
@@ -2837,13 +2837,15 @@ CopyFrom(CopyState cstate)
                 * the main target table of COPY that will be closed eventually 
by
                 * DoCopy().  Also, tupslot is NULL for the root partitioned 
table.
                 */
-               for (i = 1; i < cstate->num_dispatch; i++)
+               for (i = 1; i < cstate->num_parted; i++)
                {
-                       PartitionDispatch pd = 
cstate->partition_dispatch_info[i];
+                       PartitionTupleRoutingInfo *ptrinfo = 
cstate->ptrinfos[i];
 
-                       heap_close(pd->reldesc, NoLock);
-                       ExecDropSingleTupleTableSlot(pd->tupslot);
+                       heap_close(ptrinfo->pd->reldesc, NoLock);
+                       ExecDropSingleTupleTableSlot(ptrinfo->tupslot);
                }
+
+               /* Close all the leaf partitions and their indices */
                for (i = 0; i < cstate->num_partitions; i++)
                {
                        ResultRelInfo *resultRelInfo = cstate->partitions + i;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b594d489c..fe186abe69 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3243,8 +3243,8 @@ EvalPlanQualEnd(EPQState *epqstate)
  * tuple routing for partitioned tables
  *
  * Output arguments:
- * 'pd' receives an array of PartitionDispatch objects with one entry for
- *             every partitioned table in the partition tree
+ * 'ptrinfos' receives an array of PartitionTupleRoutingInfo objects with one
+ *             entry for each partitioned table in the partition tree
  * 'partitions' receives an array of ResultRelInfo objects with one entry for
  *             every leaf partition in the partition tree
  * 'tup_conv_maps' receives an array of TupleConversionMap objects with one
@@ -3267,7 +3267,7 @@ void
 ExecSetupPartitionTupleRouting(Relation rel,
                                                           Index resultRTindex,
                                                           EState *estate,
-                                                          PartitionDispatch 
**pd,
+                                                          
PartitionTupleRoutingInfo ***ptrinfos,
                                                           ResultRelInfo 
**partitions,
                                                           TupleConversionMap 
***tup_conv_maps,
                                                           TupleTableSlot 
**partition_tuple_slot,
@@ -3275,16 +3275,84 @@ ExecSetupPartitionTupleRouting(Relation rel,
 {
        TupleDesc       tupDesc = RelationGetDescr(rel);
        List       *leaf_parts;
+       PartitionDispatch *pds;
        ListCell   *cell;
        int                     i;
        ResultRelInfo *leaf_part_rri;
+       Relation        parent;
 
        /*
         * Get the information about the partition tree after locking all the
         * partitions.
         */
        (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, 
NULL);
-       *pd = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts);
+       pds = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts);
+
+       /*
+        * Construct PartitionTupleRoutingInfo objects, one for each partitioned
+        * table in the tree, using its PartitionDispatch in the pds array.
+        */
+       *ptrinfos = (PartitionTupleRoutingInfo **) palloc0(*num_parted *
+                                                                               
sizeof(PartitionTupleRoutingInfo *));
+       parent = NULL;
+       for (i = 0; i < *num_parted; i++)
+       {
+               PartitionTupleRoutingInfo *ptrinfo;
+
+               ptrinfo = (PartitionTupleRoutingInfo *)
+                                                       
palloc0(sizeof(PartitionTupleRoutingInfo));
+               /* Stash a reference to this PartitionDispatch. */
+               ptrinfo->pd = pds[i];
+
+               /* State for extracting partition key from tuples will go here. 
*/
+               ptrinfo->keystate = NIL;
+
+               /*
+                * For every partitioned table other than root, we must store a 
tuple
+                * table slot initialized with its tuple descriptor and a tuple
+                * conversion map to convert a tuple from its parent's rowtype 
to its
+                * own.  That is to make sure that we are looking at the 
correct row
+                * using the correct tuple descriptor when computing its 
partition key
+                * for tuple routing.
+                */
+               if (pds[i]->parentoid != InvalidOid)
+               {
+                       TupleDesc       tupdesc = 
RelationGetDescr(pds[i]->reldesc);
+
+                       /* Open the parent relation descriptor if not already 
done. */
+                       if (pds[i]->parentoid == RelationGetRelid(rel))
+                               parent = rel;
+                       else if (parent == NULL)
+                               /* Locked by 
RelationGetPartitionDispatchInfo(). */
+                               parent = heap_open(pds[i]->parentoid, NoLock);
+
+                       ptrinfo->tupslot = MakeSingleTupleTableSlot(tupdesc);
+                       ptrinfo->tupmap = 
convert_tuples_by_name(RelationGetDescr(parent),
+                                                                               
                         tupdesc,
+                                                                 
gettext_noop("could not convert row type"));
+                       /*
+                        * Close the parent descriptor, if the next partitioned 
table in
+                        * the list is not a sibling, because it will have a 
different
+                        * parent if so.
+                        */
+                       if (parent != NULL && parent != rel && i + 1 < 
*num_parted &&
+                               pds[i + 1]->parentoid != pds[i]->parentoid)
+                       {
+                               heap_close(parent, NoLock);
+                               parent = NULL;
+                       }
+               }
+               else
+               {
+                       /* Not required for the root partitioned table */
+                       ptrinfo->tupslot = NULL;
+                       ptrinfo->tupmap = NULL;
+               }
+
+               (*ptrinfos)[i] = ptrinfo;
+       }
+
+       /* For leaf partitions, we build ResultRelInfos and 
TupleConversionMaps. */
        *num_partitions = list_length(leaf_parts);
        *partitions = (ResultRelInfo *) palloc(*num_partitions *
                                                                                
   sizeof(ResultRelInfo));
@@ -3361,11 +3429,13 @@ ExecSetupPartitionTupleRouting(Relation rel,
  * by get_partition_for_tuple() unchanged.
  */
 int
-ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
-                                 TupleTableSlot *slot, EState *estate)
+ExecFindPartition(ResultRelInfo *resultRelInfo,
+                                 PartitionTupleRoutingInfo **ptrinfos,
+                                 TupleTableSlot *slot,
+                                 EState *estate)
 {
        int                     result;
-       PartitionDispatchData *failed_at;
+       PartitionTupleRoutingInfo *failed_at;
        TupleTableSlot *failed_slot;
 
        /*
@@ -3375,7 +3445,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, 
PartitionDispatch *pd,
        if (resultRelInfo->ri_PartitionCheck)
                ExecPartitionCheck(resultRelInfo, slot, estate);
 
-       result = get_partition_for_tuple(pd, slot, estate,
+       result = get_partition_for_tuple(ptrinfos, slot, estate,
                                                                         
&failed_at, &failed_slot);
        if (result < 0)
        {
@@ -3385,7 +3455,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, 
PartitionDispatch *pd,
                char       *val_desc;
                ExprContext *ecxt = GetPerTupleExprContext(estate);
 
-               failed_rel = failed_at->reldesc;
+               failed_rel = failed_at->pd->reldesc;
                ecxt->ecxt_scantuple = failed_slot;
                FormPartitionKeyDatum(failed_at, failed_slot, estate,
                                                          key_values, 
key_isnull);
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index bd84778739..06c69c5783 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -278,7 +278,7 @@ ExecInsert(ModifyTableState *mtstate,
        resultRelInfo = estate->es_result_relation_info;
 
        /* Determine the partition to heap_insert the tuple into */
-       if (mtstate->mt_partition_dispatch_info)
+       if (mtstate->mt_ptrinfos)
        {
                int                     leaf_part_index;
                TupleConversionMap *map;
@@ -292,7 +292,7 @@ ExecInsert(ModifyTableState *mtstate,
                 * respectively.
                 */
                leaf_part_index = ExecFindPartition(resultRelInfo,
-                                                                               
        mtstate->mt_partition_dispatch_info,
+                                                                               
        mtstate->mt_ptrinfos,
                                                                                
        slot,
                                                                                
        estate);
                Assert(leaf_part_index >= 0 &&
@@ -1487,7 +1487,7 @@ ExecSetupTransitionCaptureState(ModifyTableState 
*mtstate, EState *estate)
                int                     numResultRelInfos;
 
                /* Find the set of partitions so that we can find their 
TupleDescs. */
-               if (mtstate->mt_partition_dispatch_info != NULL)
+               if (mtstate->mt_ptrinfos != NULL)
                {
                        /*
                         * For INSERT via partitioned table, so we need 
TupleDescs based
@@ -1911,7 +1911,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
        if (operation == CMD_INSERT &&
                rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
        {
-               PartitionDispatch *partition_dispatch_info;
+               PartitionTupleRoutingInfo **ptrinfos;
                ResultRelInfo *partitions;
                TupleConversionMap **partition_tupconv_maps;
                TupleTableSlot *partition_tuple_slot;
@@ -1921,13 +1921,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
                ExecSetupPartitionTupleRouting(rel,
                                                                           
node->nominalRelation,
                                                                           
estate,
-                                                                          
&partition_dispatch_info,
+                                                                          
&ptrinfos,
                                                                           
&partitions,
                                                                           
&partition_tupconv_maps,
                                                                           
&partition_tuple_slot,
                                                                           
&num_parted, &num_partitions);
-               mtstate->mt_partition_dispatch_info = partition_dispatch_info;
-               mtstate->mt_num_dispatch = num_parted;
+               mtstate->mt_ptrinfos = ptrinfos;
+               mtstate->mt_num_parted = num_parted;
                mtstate->mt_partitions = partitions;
                mtstate->mt_num_partitions = num_partitions;
                mtstate->mt_partition_tupconv_maps = partition_tupconv_maps;
@@ -2336,21 +2336,24 @@ ExecEndModifyTable(ModifyTableState *node)
                                                                                
                                   resultRelInfo);
        }
 
+       /* Release some resources that we acquired for tuple-routing. */
+
        /*
-        * Close all the partitioned tables, leaf partitions, and their indices
-        *
-        * Remember node->mt_partition_dispatch_info[0] corresponds to the root
-        * partitioned table, which we must not try to close, because it is the
-        * main target table of the query that will be closed by ExecEndPlan().
-        * Also, tupslot is NULL for the root partitioned table.
+        * node->mt_ptrinfos[0] corresponds to the root partitioned table, for
+        * which we didn't create tupslot.  Also, its relation descriptor will
+        * be closed in ExecEndPlan().
         */
-       for (i = 1; i < node->mt_num_dispatch; i++)
+       for (i = 1; i < node->mt_num_parted; i++)
        {
-               PartitionDispatch pd = node->mt_partition_dispatch_info[i];
+               PartitionTupleRoutingInfo *ptrinfo = node->mt_ptrinfos[i];
 
-               heap_close(pd->reldesc, NoLock);
-               ExecDropSingleTupleTableSlot(pd->tupslot);
+               heap_close(ptrinfo->pd->reldesc, NoLock);
+               ExecDropSingleTupleTableSlot(ptrinfo->tupslot);
        }
+
+       /*
+        * Close all the leaf partitions and their indices.
+        */
        for (i = 0; i < node->mt_num_partitions; i++)
        {
                ResultRelInfo *resultRelInfo = node->mt_partitions + i;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2283c675e9..1091dd572c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -45,13 +45,8 @@ typedef struct PartitionDescData *PartitionDesc;
  *
  *     reldesc         Relation descriptor of the table
  *     key                     Partition key information of the table
- *     keystate        Execution state required for expressions in the 
partition key
  *     partdesc        Partition descriptor of the table
- *     tupslot         A standalone TupleTableSlot initialized with this 
table's tuple
- *                             descriptor
- *     tupmap          TupleConversionMap to convert from the parent's rowtype 
to
- *                             this table's rowtype (when extracting the 
partition key of a
- *                             tuple just before routing it through this table)
+ *     parentoid       OID of the parent table (InvalidOid if root partitioned 
table)
  *     indexes         Array with partdesc->nparts members (for details on what
  *                             individual members represent, see how they are 
set in
  *                             RelationGetPartitionDispatchInfo())
@@ -61,10 +56,8 @@ typedef struct PartitionDispatchData
 {
        Relation        reldesc;
        PartitionKey key;
-       List       *keystate;           /* list of ExprState */
        PartitionDesc partdesc;
-       TupleTableSlot *tupslot;
-       TupleConversionMap *tupmap;
+       Oid                     parentoid;
        int                *indexes;
 } PartitionDispatchData;
 
@@ -86,17 +79,18 @@ extern List *map_partition_varattnos(List *expr, int 
target_varno,
 extern List *RelationGetPartitionQual(Relation rel);
 extern Expr *get_partition_qual_relid(Oid relid);
 
-/* For tuple routing */
 extern PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
                                                                 int 
*num_parted, List **leaf_part_oids);
-extern void FormPartitionKeyDatum(PartitionDispatch pd,
+
+/* For tuple routing */
+extern void FormPartitionKeyDatum(PartitionTupleRoutingInfo *ptrinfo,
                                          TupleTableSlot *slot,
                                          EState *estate,
                                          Datum *values,
                                          bool *isnull);
-extern int get_partition_for_tuple(PartitionDispatch *pd,
+extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
                                                TupleTableSlot *slot,
                                                EState *estate,
-                                               PartitionDispatchData 
**failed_at,
+                                               PartitionTupleRoutingInfo 
**failed_at,
                                                TupleTableSlot **failed_slot);
 #endif                                                 /* PARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 770881849c..aee7a41b31 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -209,13 +209,13 @@ extern HeapTuple EvalPlanQualGetTuple(EPQState *epqstate, 
Index rti);
 extern void ExecSetupPartitionTupleRouting(Relation rel,
                                                           Index resultRTindex,
                                                           EState *estate,
-                                                          PartitionDispatch 
**pd,
+                                                          
PartitionTupleRoutingInfo ***ptrinfos,
                                                           ResultRelInfo 
**partitions,
                                                           TupleConversionMap 
***tup_conv_maps,
                                                           TupleTableSlot 
**partition_tuple_slot,
                                                           int *num_parted, int 
*num_partitions);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
-                                 PartitionDispatch *pd,
+                                 PartitionTupleRoutingInfo **ptrinfos,
                                  TupleTableSlot *slot,
                                  EState *estate);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 90a60abc4d..c554a1b311 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -414,6 +414,42 @@ typedef struct ResultRelInfo
        Relation        ri_PartitionRoot;
 } ResultRelInfo;
 
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+
+/*
+ * PartitionTupleRoutingInfo - information required for tuple-routing
+ *                                                        through one 
partitioned table in a partition
+ *                                                        tree
+ */
+typedef struct PartitionTupleRoutingInfo
+{
+
+       /* Information about the table's partitions */
+       PartitionDispatch       pd;
+
+       /*
+        * The execution state required for expressions contained in the 
partition
+        * key.  It is NIL until initialized by FormPartitionKeyDatum() if and 
when
+        * it is called; for example, the first time a tuple is routed through 
this
+        * table.
+        */
+       List   *keystate;
+
+       /*
+        * A standalone TupleTableSlot initialized with this table's tuple
+        * descriptor
+        */
+       TupleTableSlot *tupslot;
+
+       /*
+        * TupleConversionMap to convert from the parent's rowtype to this 
table's
+        * rowtype (when extracting the partition key of a tuple just before
+        * routing it through this table)
+        */
+       TupleConversionMap *tupmap;
+} PartitionTupleRoutingInfo;
+
 /* ----------------
  *       EState information
  *
@@ -973,9 +1009,9 @@ typedef struct ModifyTableState
        TupleTableSlot *mt_existing;    /* slot to store existing target tuple 
in */
        List       *mt_excludedtlist;   /* the excluded pseudo relation's tlist 
 */
        TupleTableSlot *mt_conflproj;   /* CONFLICT ... SET ... projection 
target */
-       struct PartitionDispatchData **mt_partition_dispatch_info;
        /* Tuple-routing support info */
-       int                     mt_num_dispatch;        /* Number of entries in 
the above array */
+       struct PartitionTupleRoutingInfo **mt_ptrinfos;
+       int                     mt_num_parted;          /* Number of entries in 
the above array */
        int                     mt_num_partitions;      /* Number of members in 
the following
                                                                         * 
arrays */
        ResultRelInfo *mt_partitions;   /* Per partition result relation */
-- 
2.11.0

From f95b1cb33159620175e012a471513f9a32d64c43 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Fri, 8 Sep 2017 17:35:10 +0900
Subject: [PATCH 2/3] Make RelationGetPartitionDispatch expansion order
 depth-first

This is so as it matches what the planner is doing with partitioning
inheritance expansion.  Matching with planner order helps because
it helps ease matching the executor's per-partition objects with
planner-created per-partition nodes.
---
 src/backend/catalog/partition.c | 191 ++++++++++++++++------------------------
 1 file changed, 74 insertions(+), 117 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 4f594243d3..a4abe08088 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -139,6 +139,8 @@ static int32 partition_bound_cmp(PartitionKey key,
 static int partition_bound_bsearch(PartitionKey key,
                                                PartitionBoundInfo boundinfo,
                                                void *probe, bool 
probe_is_bound, bool *is_equal);
+static void get_partition_dispatch_recurse(Relation rel, Relation parent,
+                                                          List **pds, List 
**leaf_part_oids);
 
 /*
  * RelationBuildPartitionDesc
@@ -961,21 +963,6 @@ get_partition_qual_relid(Oid relid)
 }
 
 /*
- * Append OIDs of rel's partitions to the list 'partoids' and for each OID,
- * append pointer rel to the list 'parents'.
- */
-#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \
-       do\
-       {\
-               int             i;\
-               for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\
-               {\
-                       (partoids) = lappend_oid((partoids), 
(rel)->rd_partdesc->oids[i]);\
-                       (parents) = lappend((parents), (rel));\
-               }\
-       } while(0)
-
-/*
  * RelationGetPartitionDispatchInfo
  *             Returns information necessary to route tuples down a partition 
tree
  *
@@ -991,16 +978,47 @@ PartitionDispatch *
 RelationGetPartitionDispatchInfo(Relation rel,
                                                                 int 
*num_parted, List **leaf_part_oids)
 {
+       List   *pdlist;
        PartitionDispatchData **pd;
-       List       *all_parts = NIL,
-                          *all_parents = NIL,
-                          *parted_rels,
-                          *parted_rel_parents;
-       ListCell   *lc1,
-                          *lc2;
-       int                     i,
-                               k,
-                               offset;
+       ListCell *lc;
+       int             i;
+
+       Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+
+       *num_parted = 0;
+       *leaf_part_oids = NIL;
+
+       get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids);
+       *num_parted = list_length(pdlist);
+       pd = (PartitionDispatchData **) palloc(*num_parted *
+                                                                               
   sizeof(PartitionDispatchData *));
+       i = 0;
+       foreach (lc, pdlist)
+       {
+               pd[i++] = lfirst(lc);
+       }
+
+       return pd;
+}
+
+/*
+ * get_partition_dispatch_recurse
+ *             Recursively expand partition tree rooted at rel
+ *
+ * As the partition tree is expanded in a depth-first manner, we mantain two
+ * global lists: of PartitionDispatch objects corresponding to partitioned
+ * tables in *pds and of the leaf partition OIDs in *leaf_part_oids.
+ */
+static void
+get_partition_dispatch_recurse(Relation rel, Relation parent,
+                                                          List **pds, List 
**leaf_part_oids)
+{
+       PartitionDesc   partdesc = RelationGetPartitionDesc(rel);
+       PartitionKey    partkey = RelationGetPartitionKey(rel);
+       PartitionDispatch pd;
+       int                     i;
+       int                     next_leaf_idx = list_length(*leaf_part_oids),
+                               next_parted_idx = list_length(*pds);
 
        /*
         * We rely on the relcache to traverse the partition tree to build both
@@ -1016,108 +1034,47 @@ RelationGetPartitionDispatchInfo(Relation rel,
         * a bit tricky but works because the foreach() macro doesn't fetch the
         * next list element until the bottom of the loop.
         */
-       *num_parted = 1;
-       parted_rels = list_make1(rel);
-       /* Root partitioned table has no parent, so NULL for parent */
-       parted_rel_parents = list_make1(NULL);
-       APPEND_REL_PARTITION_OIDS(rel, all_parts, all_parents);
-       forboth(lc1, all_parts, lc2, all_parents)
+       pd = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
+       *pds = lappend(*pds, pd);
+       pd->reldesc = rel;
+       pd->key = partkey;
+       pd->partdesc = partdesc;
+       if (parent != NULL)
+               pd->parentoid = RelationGetRelid(parent);
+       else
+               pd->parentoid = InvalidOid;
+       pd->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
+       for (i = 0; i < partdesc->nparts; i++)
        {
-               Oid                     partrelid = lfirst_oid(lc1);
-               Relation        parent = lfirst(lc2);
+               Oid                     partrelid = partdesc->oids[i];
 
-               if (get_rel_relkind(partrelid) == RELKIND_PARTITIONED_TABLE)
+               if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
                {
-                       /*
-                        * Already locked by the caller.  Note that it is the
-                        * responsibility of the caller to close the below 
relcache entry,
-                        * once done using the information being collected here 
(for
-                        * example, in ExecEndModifyTable).
-                        */
-                       Relation        partrel = heap_open(partrelid, NoLock);
-
-                       (*num_parted)++;
-                       parted_rels = lappend(parted_rels, partrel);
-                       parted_rel_parents = lappend(parted_rel_parents, 
parent);
-                       APPEND_REL_PARTITION_OIDS(partrel, all_parts, 
all_parents);
+                       *leaf_part_oids = lappend_oid(*leaf_part_oids, 
partrelid);
+                       pd->indexes[i] = next_leaf_idx++;
                }
-       }
-
-       /*
-        * We want to create two arrays - one for leaf partitions and another 
for
-        * partitioned tables (including the root table and internal 
partitions).
-        * While we only create the latter here, leaf partition array of 
suitable
-        * objects (such as, ResultRelInfo) is created by the caller using the
-        * list of OIDs we return.  Indexes into these arrays get assigned in a
-        * breadth-first manner, whereby partitions of any given level are 
placed
-        * consecutively in the respective arrays.
-        */
-       pd = (PartitionDispatchData **) palloc(*num_parted *
-                                                                               
   sizeof(PartitionDispatchData *));
-       *leaf_part_oids = NIL;
-       i = k = offset = 0;
-       forboth(lc1, parted_rels, lc2, parted_rel_parents)
-       {
-               Relation        partrel = lfirst(lc1);
-               Relation        parent = lfirst(lc2);
-               PartitionKey partkey = RelationGetPartitionKey(partrel);
-               PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
-               int                     j,
-                                       m;
-
-               pd[i] = (PartitionDispatch) 
palloc(sizeof(PartitionDispatchData));
-               pd[i]->reldesc = partrel;
-               pd[i]->key = partkey;
-               pd[i]->partdesc = partdesc;
-               if (parent != NULL)
-                       pd[i]->parentoid = RelationGetRelid(parent);
                else
-                       pd[i]->parentoid = InvalidOid;
-
-               pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
-
-               /*
-                * Indexes corresponding to the internal partitions are 
multiplied by
-                * -1 to distinguish them from those of leaf partitions.  
Encountering
-                * an index >= 0 means we found a leaf partition, which is 
immediately
-                * returned as the partition we are looking for.  A negative 
index
-                * means we found a partitioned table, whose PartitionDispatch 
object
-                * is located at the above index multiplied back by -1.  Using 
the
-                * PartitionDispatch object, search is continued further down 
the
-                * partition tree.
-                */
-               m = 0;
-               for (j = 0; j < partdesc->nparts; j++)
                {
-                       Oid                     partrelid = partdesc->oids[j];
+                       Relation partrel = heap_open(partrelid, NoLock);
 
-                       if (get_rel_relkind(partrelid) != 
RELKIND_PARTITIONED_TABLE)
-                       {
-                               *leaf_part_oids = lappend_oid(*leaf_part_oids, 
partrelid);
-                               pd[i]->indexes[j] = k++;
-                       }
-                       else
-                       {
-                               /*
-                                * offset denotes the number of partitioned 
tables of upper
-                                * levels including those of the current level. 
 Any partition
-                                * of this table must belong to the next level 
and hence will
-                                * be placed after the last partitioned table 
of this level.
-                                */
-                               pd[i]->indexes[j] = -(1 + offset + m);
-                               m++;
-                       }
-               }
-               i++;
+                       /*
+                        * offset denotes the number of partitioned tables of 
upper
+                        * levels including those of the current level.  Any 
partition
+                        * of this table must belong to the next level and 
hence will
+                        * be placed after the last partitioned table of this 
level.
+                        */
+                       pd->indexes[i] = -(1 + next_parted_idx);
+                       get_partition_dispatch_recurse(partrel, rel, pds, 
leaf_part_oids);
 
-               /*
-                * This counts the number of partitioned tables at upper levels
-                * including those of the current level.
-                */
-               offset += m;
+                       /*
+                        * Fast-forward both leaf partition and parted indexes 
to account
+                        * for the leaf partitions and PartitionDispatch 
objects just
+                        * added.
+                        */
+                       next_parted_idx += (list_length(*pds) - next_parted_idx 
- 1);
+                       next_leaf_idx += (list_length(*leaf_part_oids) - 
next_leaf_idx);
+               }
        }
-
-       return pd;
 }
 
 /* Module-local functions */
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to