I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one.  It's not fully cooked yet though.

Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful.  They are to be applied on top of
the patches being discussed at [1].  Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning.  I will share that patch after
debugging it some more.

The main design goal of the patches I'm sharing here now is to defer the
locking and  opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
 Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child.  ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time.  Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly.   Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.

Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1], we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
only use that node internally.

Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table.  The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables.  Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).

When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query.  This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on.  As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions.  The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node.  For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it.  For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list.  Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected.  The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().

set_append_rel_size() processing then continues for the root partitioned
table.  It is at this point that we will create the RelOptInfos and
AppendRelInfos for partitions.  First for those of the root partitioned
table and then for those of each partitioned table when
set_append_rel_size() will be recursively called for the latter.


Note that this is still largely a WIP patch and the implementation details
might change per both the feedback here and the discussion at [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf%40lab.ntt.co.jp
From 567e07fa19af575ece50f607a4374c370ae7375f Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Tue, 8 Aug 2017 18:42:30 +0900
Subject: [PATCH 1/3] Teach pg_inherits.c a bit about partitioning

Both find_inheritance_children and find_all_inheritors now list
partitioned child tables before non-partitioned ones and return
the number of partitioned tables in an optional output argument

We also now store in pg_inherits, when adding a new child, if the
child is a partitioned table.

Per design idea from Robert Haas
---
 contrib/sepgsql/dml.c                  |   2 +-
 doc/src/sgml/catalogs.sgml             |  10 +++
 src/backend/catalog/partition.c        |   2 +-
 src/backend/catalog/pg_inherits.c      | 157 ++++++++++++++++++++++++++-------
 src/backend/commands/analyze.c         |   3 +-
 src/backend/commands/lockcmds.c        |   2 +-
 src/backend/commands/publicationcmds.c |   2 +-
 src/backend/commands/tablecmds.c       |  56 +++++++-----
 src/backend/commands/vacuum.c          |   3 +-
 src/backend/executor/execMain.c        |   3 +-
 src/backend/optimizer/prep/prepunion.c |   2 +-
 src/include/catalog/pg_inherits.h      |  20 ++++-
 src/include/catalog/pg_inherits_fn.h   |   5 +-
 13 files changed, 200 insertions(+), 67 deletions(-)

diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index b643720e36..6fc279805c 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -333,7 +333,7 @@ sepgsql_dml_privileges(List *rangeTabls, bool 
abort_on_violation)
                if (!rte->inh)
                        tableIds = list_make1_oid(rte->relid);
                else
-                       tableIds = find_all_inheritors(rte->relid, NoLock, 
NULL);
+                       tableIds = find_all_inheritors(rte->relid, NoLock, 
NULL, NULL);
 
                foreach(li, tableIds)
                {
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef7054cf26..00ba2906c2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3894,6 +3894,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration 
count&gt;</>:<replaceable>&lt;salt&gt;<
        inherited columns are to be arranged.  The count starts at 1.
       </entry>
      </row>
+
+     <row>
+      <entry><structfield>inhchildpartitioned</structfield></entry>
+      <entry><type>bool</type></entry>
+      <entry></entry>
+      <entry>
+       This is <literal>true</> if the child table is a partitioned table,
+       <literal>false</> otherwise
+      </entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7618e4cb31..36f5c80b4f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,7 +196,7 @@ RelationBuildPartitionDesc(Relation rel)
                return;
 
        /* Get partition oids from pg_inherits */
-       inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
+       inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, 
NULL);
 
        /* Collect bound spec nodes in a list */
        i = 0;
diff --git a/src/backend/catalog/pg_inherits.c 
b/src/backend/catalog/pg_inherits.c
index 245a374fc9..5292ec8058 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -33,6 +33,8 @@
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 
+static int32 inhchildinfo_cmp(const void *p1, const void *p2);
+
 /*
  * Entry of a hash table used in find_all_inheritors. See below.
  */
@@ -42,6 +44,30 @@ typedef struct SeenRelsEntry
        ListCell   *numparents_cell;    /* corresponding list cell */
 } SeenRelsEntry;
 
+/* Information about one inheritance child table. */
+typedef struct InhChildInfo
+{
+       Oid                     relid;
+       bool            is_partitioned;
+} InhChildInfo;
+
+#define OID_CMP(o1, o2) \
+               ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0));
+
+static int32
+inhchildinfo_cmp(const void *p1, const void *p2)
+{
+       InhChildInfo c1 = *((const InhChildInfo *) p1);
+       InhChildInfo c2 = *((const InhChildInfo *) p2);
+
+       if (c1.is_partitioned && !c2.is_partitioned)
+               return -1;
+       if (!c1.is_partitioned && c2.is_partitioned)
+               return 1;
+
+       return OID_CMP(c1.relid, c2.relid);
+}
+
 /*
  * find_inheritance_children
  *
@@ -54,7 +80,8 @@ typedef struct SeenRelsEntry
  * against possible DROPs of child relations.
  */
 List *
-find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+                                                 int *num_partitioned_children)
 {
        List       *list = NIL;
        Relation        relation;
@@ -62,9 +89,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
        ScanKeyData key[1];
        HeapTuple       inheritsTuple;
        Oid                     inhrelid;
-       Oid                *oidarr;
-       int                     maxoids,
-                               numoids,
+       InhChildInfo *inhchildren;
+       int                     maxchildren,
+                               numchildren,
+                               my_num_partitioned_children,
                                i;
 
        /*
@@ -77,9 +105,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE 
lockmode)
        /*
         * Scan pg_inherits and build a working array of subclass OIDs.
         */
-       maxoids = 32;
-       oidarr = (Oid *) palloc(maxoids * sizeof(Oid));
-       numoids = 0;
+       maxchildren = 32;
+       inhchildren = (InhChildInfo *) palloc(maxchildren * 
sizeof(InhChildInfo));
+       numchildren = 0;
+       my_num_partitioned_children = 0;
 
        relation = heap_open(InheritsRelationId, AccessShareLock);
 
@@ -93,34 +122,45 @@ find_inheritance_children(Oid parentrelId, LOCKMODE 
lockmode)
 
        while ((inheritsTuple = systable_getnext(scan)) != NULL)
        {
-               inhrelid = ((Form_pg_inherits) 
GETSTRUCT(inheritsTuple))->inhrelid;
-               if (numoids >= maxoids)
+               Form_pg_inherits form = (Form_pg_inherits) 
GETSTRUCT(inheritsTuple);
+
+               if (numchildren >= maxchildren)
                {
-                       maxoids *= 2;
-                       oidarr = (Oid *) repalloc(oidarr, maxoids * 
sizeof(Oid));
+                       maxchildren *= 2;
+                       inhchildren = (InhChildInfo *) repalloc(inhchildren,
+                                                                               
maxchildren * sizeof(InhChildInfo));
                }
-               oidarr[numoids++] = inhrelid;
+               inhchildren[numchildren].relid = form->inhrelid;
+               inhchildren[numchildren].is_partitioned = form->inhpartitioned;
+
+               if (form->inhpartitioned)
+                       my_num_partitioned_children++;
+               numchildren++;
        }
 
        systable_endscan(scan);
 
        heap_close(relation, AccessShareLock);
 
+       if (num_partitioned_children)
+               *num_partitioned_children = my_num_partitioned_children;
+
        /*
         * If we found more than one child, sort them by OID.  This ensures
         * reasonably consistent behavior regardless of the vagaries of an
         * indexscan.  This is important since we need to be sure all backends
         * lock children in the same order to avoid needless deadlocks.
         */
-       if (numoids > 1)
-               qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
+       if (numchildren > 1)
+               qsort(inhchildren, numchildren, sizeof(InhChildInfo),
+                         inhchildinfo_cmp);
 
        /*
         * Acquire locks and build the result list.
         */
-       for (i = 0; i < numoids; i++)
+       for (i = 0; i < numchildren; i++)
        {
-               inhrelid = oidarr[i];
+               inhrelid = inhchildren[i].relid;
 
                if (lockmode != NoLock)
                {
@@ -144,7 +184,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE 
lockmode)
                list = lappend_oid(list, inhrelid);
        }
 
-       pfree(oidarr);
+       pfree(inhchildren);
 
        return list;
 }
@@ -159,19 +199,30 @@ find_inheritance_children(Oid parentrelId, LOCKMODE 
lockmode)
  *             given rel.
  *
  * The specified lock type is acquired on all child relations (but not on the
- * given rel; caller should already have locked it).  If lockmode is NoLock
- * then no locks are acquired, but caller must beware of race conditions
- * against possible DROPs of child relations.
+ * given rel; caller should already have locked it), unless
+ * lock_only_partitioned_children is specified, in which case, only the
+ * child relations that are partitioned tables are locked.  If lockmode is
+ * NoLock then no locks are acquired, but caller must beware of race
+ * conditions against possible DROPs of child relations.
+ *
+ * Returned list of OIDs is such that all the partitioned tables in the tree
+ * appear at the head of the list.  If num_partitioned_children is non-NULL,
+ * *num_partitioned_children returns the number of partitioned child table
+ * OIDs at the head of the list.
  */
 List *
-find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
+find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+                                       List **numparents, int 
*num_partitioned_children)
 {
        /* hash table for O(1) rel_oid -> rel_numparents cell lookup */
        HTAB       *seen_rels;
        HASHCTL         ctl;
        List       *rels_list,
-                          *rel_numparents;
+                          *rel_numparents,
+                          *partitioned_rels_list,
+                          *other_rels_list;
        ListCell   *l;
+       int                     my_num_partitioned_children;
 
        memset(&ctl, 0, sizeof(ctl));
        ctl.keysize = sizeof(Oid);
@@ -185,31 +236,69 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, 
List **numparents)
 
        /*
         * We build a list starting with the given rel and adding all direct and
-        * indirect children.  We can use a single list as both the record of
-        * already-found rels and the agenda of rels yet to be scanned for more
-        * children.  This is a bit tricky but works because the foreach() macro
-        * doesn't fetch the next list element until the bottom of the loop.
+        * indirect children.  We can use a single list (rels_list) as both the
+        * record of already-found rels and the agenda of rels yet to be scanned
+        * for more children.  This is a bit tricky but works because the 
foreach()
+        * macro doesn't fetch the next list element until the bottom of the 
loop.
+        *
+        * partitioned_child_rels will contain the OIDs of the partitioned child
+        * tables and other_rels_list will contain the OIDs of the 
non-partitioned
+        * child tables.  Result list will be generated by concatening the two
+        * lists together with partitioned_child_rels appearing first.
         */
        rels_list = list_make1_oid(parentrelId);
+       partitioned_rels_list = list_make1_oid(parentrelId);
+       other_rels_list = NIL;
        rel_numparents = list_make1_int(0);
 
+       my_num_partitioned_children = 0;
+
        foreach(l, rels_list)
        {
                Oid                     currentrel = lfirst_oid(l);
                List       *currentchildren;
-               ListCell   *lc;
+               ListCell   *lc,
+                                  *first_nonpartitioned_child;
+               int                     cur_num_partitioned_children = 0,
+                                       i;
 
                /* Get the direct children of this rel */
-               currentchildren = find_inheritance_children(currentrel, 
lockmode);
+               currentchildren = find_inheritance_children(currentrel, 
lockmode,
+                                                                               
        &cur_num_partitioned_children);
+
+               my_num_partitioned_children += cur_num_partitioned_children;
+
+               /*
+                * Append partitioned children to rels_list and 
partitioned_rels_list.
+                * We know for sure that partitioned children don't need the
+                * the de-duplication logic in the following loop, because 
partitioned
+                * tables are not allowed to partiticipate in multiple 
inheritance.
+                */
+               i = 0;
+               foreach(lc, currentchildren)
+               {
+                       if (i < cur_num_partitioned_children)
+                       {
+                               Oid             child_oid = lfirst_oid(lc);
+
+                               rels_list = lappend_oid(rels_list, child_oid);
+                               partitioned_rels_list = 
lappend_oid(partitioned_rels_list,
+                                                                               
                        child_oid);
+                       }
+                       else
+                               break;
+                       i++;
+               }
+               first_nonpartitioned_child = lc;
 
                /*
                 * Add to the queue only those children not already seen. This 
avoids
                 * making duplicate entries in case of multiple inheritance 
paths from
                 * the same parent.  (It'll also keep us from getting into an 
infinite
                 * loop, though theoretically there can't be any cycles in the
-                * inheritance graph anyway.)
+                * inheritance graph anyway.)  Also, add them to the 
other_rels_list.
                 */
-               foreach(lc, currentchildren)
+               for_each_cell(lc, first_nonpartitioned_child)
                {
                        Oid                     child_oid = lfirst_oid(lc);
                        bool            found;
@@ -225,6 +314,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, 
List **numparents)
                        {
                                /* if it's not there, add it. expect 1 parent, 
initially. */
                                rels_list = lappend_oid(rels_list, child_oid);
+                               other_rels_list = lappend_oid(other_rels_list, 
child_oid);
                                rel_numparents = lappend_int(rel_numparents, 1);
                                hash_entry->numparents_cell = 
rel_numparents->tail;
                        }
@@ -237,8 +327,13 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, 
List **numparents)
                list_free(rel_numparents);
 
        hash_destroy(seen_rels);
+       list_free(rels_list);
+
+       if (num_partitioned_children)
+               *num_partitioned_children = my_num_partitioned_children;
 
-       return rels_list;
+       /* List partitioned child tables before non-partitioned ones. */
+       return list_concat(partitioned_rels_list, other_rels_list);
 }
 
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fbad13ea94..10cc2b8314 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1282,7 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
         * the children.
         */
        tableOIDs =
-               find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, 
NULL);
+               find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, 
NULL,
+                                                       NULL);
 
        /*
         * Check that there's at least one descendant, else fail.  This could
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 9fe9e022b0..529f244f7e 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
        List       *children;
        ListCell   *lc;
 
-       children = find_inheritance_children(reloid, NoLock);
+       children = find_inheritance_children(reloid, NoLock, NULL);
 
        foreach(lc, children)
        {
diff --git a/src/backend/commands/publicationcmds.c 
b/src/backend/commands/publicationcmds.c
index 610cb499d2..64179ea3ef 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -516,7 +516,7 @@ OpenTableList(List *tables)
                        List       *children;
 
                        children = find_all_inheritors(myrelid, 
ShareUpdateExclusiveLock,
-                                                                               
   NULL);
+                                                                               
   NULL, NULL);
 
                        foreach(child, children)
                        {
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0f08245a67..4d686a6f71 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -299,10 +299,10 @@ static bool MergeCheckConstraint(List *constraints, char 
*name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation 
parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation 
parent_rel);
 static void StoreCatalogInheritance(Oid relationId, List *supers,
-                                               bool child_is_partition);
+                                               bool child_is_partition, bool 
child_is_partitioned);
 static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
                                                 int16 seqNumber, Relation 
inhRelation,
-                                                bool child_is_partition);
+                                                bool child_is_partition, bool 
child_is_partitioned);
 static int     findAttrByName(const char *attributeName, List *schema);
 static void AlterIndexNamespaces(Relation classRel, Relation rel,
                                         Oid oldNspOid, Oid newNspOid, 
ObjectAddresses *objsMoved);
@@ -753,7 +753,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
                                                                                
  typaddress);
 
        /* Store inheritance information for new rel. */
-       StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != 
NULL);
+       StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != 
NULL,
+                                                       relkind == 
RELKIND_PARTITIONED_TABLE);
 
        /*
         * We must bump the command counter to make the newly-created relation
@@ -1238,7 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt)
                        ListCell   *child;
                        List       *children;
 
-                       children = find_all_inheritors(myrelid, 
AccessExclusiveLock, NULL);
+                       children = find_all_inheritors(myrelid, 
AccessExclusiveLock, NULL,
+                                                                               
   NULL);
 
                        foreach(child, children)
                        {
@@ -2305,7 +2307,7 @@ MergeCheckConstraint(List *constraints, char *name, Node 
*expr)
  */
 static void
 StoreCatalogInheritance(Oid relationId, List *supers,
-                                               bool child_is_partition)
+                                               bool child_is_partition, bool 
child_is_partitioned)
 {
        Relation        relation;
        int16           seqNumber;
@@ -2336,7 +2338,7 @@ StoreCatalogInheritance(Oid relationId, List *supers,
                Oid                     parentOid = lfirst_oid(entry);
 
                StoreCatalogInheritance1(relationId, parentOid, seqNumber, 
relation,
-                                                                
child_is_partition);
+                                                                
child_is_partition, child_is_partitioned);
                seqNumber++;
        }
 
@@ -2350,7 +2352,7 @@ StoreCatalogInheritance(Oid relationId, List *supers,
 static void
 StoreCatalogInheritance1(Oid relationId, Oid parentOid,
                                                 int16 seqNumber, Relation 
inhRelation,
-                                                bool child_is_partition)
+                                                bool child_is_partition, bool 
child_is_partitioned)
 {
        TupleDesc       desc = RelationGetDescr(inhRelation);
        Datum           values[Natts_pg_inherits];
@@ -2365,6 +2367,8 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid,
        values[Anum_pg_inherits_inhrelid - 1] = ObjectIdGetDatum(relationId);
        values[Anum_pg_inherits_inhparent - 1] = ObjectIdGetDatum(parentOid);
        values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber);
+       values[Anum_pg_inherits_inhpartitioned - 1] =
+                                                                       
BoolGetDatum(child_is_partitioned);
 
        memset(nulls, 0, sizeof(nulls));
 
@@ -2564,7 +2568,7 @@ renameatt_internal(Oid myrelid,
                 * outside the inheritance hierarchy being processed.
                 */
                child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
-                                                                               
 &child_numparents);
+                                                                               
 &child_numparents, NULL);
 
                /*
                 * find_all_inheritors does the recursive search of the 
inheritance
@@ -2591,7 +2595,7 @@ renameatt_internal(Oid myrelid,
                 * expected_parents will only be 0 if we are not already 
recursing.
                 */
                if (expected_parents == 0 &&
-                       find_inheritance_children(myrelid, NoLock) != NIL)
+                       find_inheritance_children(myrelid, NoLock, NULL) != NIL)
                        ereport(ERROR,
                                        
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                         errmsg("inherited column \"%s\" must 
be renamed in child tables too",
@@ -2774,7 +2778,7 @@ rename_constraint_internal(Oid myrelid,
                                           *li;
 
                        child_oids = find_all_inheritors(myrelid, 
AccessExclusiveLock,
-                                                                               
         &child_numparents);
+                                                                               
         &child_numparents, NULL);
 
                        forboth(lo, child_oids, li, child_numparents)
                        {
@@ -2790,7 +2794,7 @@ rename_constraint_internal(Oid myrelid,
                else
                {
                        if (expected_parents == 0 &&
-                               find_inheritance_children(myrelid, NoLock) != 
NIL)
+                               find_inheritance_children(myrelid, NoLock, 
NULL) != NIL)
                                ereport(ERROR,
                                                
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                                 errmsg("inherited constraint 
\"%s\" must be renamed in child tables too",
@@ -4803,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel,
                ListCell   *child;
                List       *children;
 
-               children = find_all_inheritors(relid, lockmode, NULL);
+               children = find_all_inheritors(relid, lockmode, NULL, NULL);
 
                /*
                 * find_all_inheritors does the recursive search of the 
inheritance
@@ -5212,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
         */
        if (colDef->identity &&
                recurse &&
-               find_inheritance_children(myrelid, NoLock) != NIL)
+               find_inheritance_children(myrelid, NoLock, NULL) != NIL)
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                 errmsg("cannot recursively add identity column 
to table that has child tables")));
@@ -5418,7 +5422,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
         * routines, we have to do this one level of recursion at a time; we 
can't
         * use find_all_inheritors to do it in one pass.
         */
-       children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+       children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+                                                                               
 NULL);
 
        /*
         * If we are told not to recurse, there had better not be any child
@@ -6537,7 +6542,8 @@ ATExecDropColumn(List **wqueue, Relation rel, const char 
*colName,
         * routines, we have to do this one level of recursion at a time; we 
can't
         * use find_all_inheritors to do it in one pass.
         */
-       children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+       children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+                                                                               
 NULL);
 
        if (children)
        {
@@ -6971,7 +6977,8 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
         * routines, we have to do this one level of recursion at a time; we 
can't
         * use find_all_inheritors to do it in one pass.
         */
-       children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+       children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+                                                                               
 NULL);
 
        /*
         * Check if ONLY was specified with ALTER TABLE.  If so, allow the
@@ -7692,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, 
bool recurse,
                         */
                        if (!recursing && !con->connoinherit)
                                children = 
find_all_inheritors(RelationGetRelid(rel),
-                                                                               
           lockmode, NULL);
+                                                                               
           lockmode, NULL, NULL);
 
                        /*
                         * For CHECK constraints, we must ensure that we only 
mark the
@@ -8575,7 +8582,8 @@ ATExecDropConstraint(Relation rel, const char *constrName,
         * use find_all_inheritors to do it in one pass.
         */
        if (!is_no_inherit_constraint)
-               children = find_inheritance_children(RelationGetRelid(rel), 
lockmode);
+               children = find_inheritance_children(RelationGetRelid(rel), 
lockmode,
+                                                                               
         NULL);
        else
                children = NIL;
 
@@ -8864,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue,
                ListCell   *child;
                List       *children;
 
-               children = find_all_inheritors(relid, lockmode, NULL);
+               children = find_all_inheritors(relid, lockmode, NULL, NULL);
 
                /*
                 * find_all_inheritors does the recursive search of the 
inheritance
@@ -8915,7 +8923,8 @@ ATPrepAlterColumnType(List **wqueue,
                }
        }
        else if (!recursing &&
-                        find_inheritance_children(RelationGetRelid(rel), 
NoLock) != NIL)
+                        find_inheritance_children(RelationGetRelid(rel),
+                                                                          
NoLock, NULL) != NIL)
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                 errmsg("type of inherited column \"%s\" must 
be changed in child tables too",
@@ -11027,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, 
LOCKMODE lockmode)
         * We use weakest lock we can on child's children, namely 
AccessShareLock.
         */
        children = find_all_inheritors(RelationGetRelid(child_rel),
-                                                                  
AccessShareLock, NULL);
+                                                                  
AccessShareLock, NULL, NULL);
 
        if (list_member_oid(children, RelationGetRelid(parent_rel)))
                ereport(ERROR,
@@ -11136,6 +11145,8 @@ CreateInheritance(Relation child_rel, Relation 
parent_rel)
                                                         inhseqno + 1,
                                                         catalogRelation,
                                                         
parent_rel->rd_rel->relkind ==
+                                                        
RELKIND_PARTITIONED_TABLE,
+                                                        
child_rel->rd_rel->relkind ==
                                                         
RELKIND_PARTITIONED_TABLE);
 
        /* Now we're done with pg_inherits */
@@ -13696,7 +13707,8 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
         * weaker lock now and the stronger one only when needed.
         */
        attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
-                                                                               
         AccessExclusiveLock, NULL);
+                                                                               
         AccessExclusiveLock, NULL,
+                                                                               
         NULL);
        if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
                ereport(ERROR,
                                (errcode(ERRCODE_DUPLICATE_TABLE),
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa181207a..e2e5ffce42 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -430,7 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
                oldcontext = MemoryContextSwitchTo(vac_context);
                if (include_parts)
                        oid_list = list_concat(oid_list,
-                                                                  
find_all_inheritors(relid, NoLock, NULL));
+                                                                  
find_all_inheritors(relid, NoLock, NULL,
+                                                                               
                           NULL));
                else
                        oid_list = lappend_oid(oid_list, relid);
                MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a03188aba3..4424649769 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3278,7 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel,
         * Get the information about the partition tree after locking all the
         * partitions.
         */
-       (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, 
NULL);
+       (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, 
NULL,
+                                                          NULL);
        RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts);
 
        /*
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index 68d0d8efa3..b84d6c8878 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
                lockmode = AccessShareLock;
 
        /* Scan for all members of inheritance set, acquire needed locks */
-       inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+       inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL);
 
        /*
         * Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/pg_inherits.h 
b/src/include/catalog/pg_inherits.h
index 26bfab5db6..9f59c017e7 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -30,9 +30,20 @@
 
 CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS
 {
+       /* OID of the child table. */
        Oid                     inhrelid;
+
+       /* OID of the parent table. */
        Oid                     inhparent;
+
+       /*
+        * Sequence number (starting with 1) of this parent, if this child table
+        * has multiple parents.
+        */
        int32           inhseqno;
+
+       /* true if the child is a partitioned table, false otherwise. */
+       bool            inhpartitioned;
 } FormData_pg_inherits;
 
 /* ----------------
@@ -46,10 +57,11 @@ typedef FormData_pg_inherits *Form_pg_inherits;
  *             compiler constants for pg_inherits
  * ----------------
  */
-#define Natts_pg_inherits                              3
-#define Anum_pg_inherits_inhrelid              1
-#define Anum_pg_inherits_inhparent             2
-#define Anum_pg_inherits_inhseqno              3
+#define Natts_pg_inherits                                      4
+#define Anum_pg_inherits_inhrelid                      1
+#define Anum_pg_inherits_inhparent                     2
+#define Anum_pg_inherits_inhseqno                      3
+#define Anum_pg_inherits_inhpartitioned                4
 
 /* ----------------
  *             pg_inherits has no initial contents
diff --git a/src/include/catalog/pg_inherits_fn.h 
b/src/include/catalog/pg_inherits_fn.h
index 7743388899..8f371acae7 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -17,9 +17,10 @@
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
 
-extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
+extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+                                                 int 
*num_partitioned_children);
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
-                                       List **parents);
+                                       List **parents, int 
*num_partitioned_children);
 extern bool has_subclass(Oid relationId);
 extern bool has_superclass(Oid relationId);
 extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);
-- 
2.11.0

From ef86d03a6ed6ac0cdbdede0c1012f9006ed24de2 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Thu, 10 Aug 2017 17:59:18 +0900
Subject: [PATCH 2/3] Allow locking only partitioned children in partition tree

find_inheritance_childrem will still return the OIDs of the
non-partitioned children, but does not lock them if the caller asks
it so.

None of the callers pass 'true' yet though.
---
 contrib/sepgsql/dml.c                  |  3 ++-
 src/backend/catalog/partition.c        |  3 ++-
 src/backend/catalog/pg_inherits.c      | 20 ++++++++++++++++----
 src/backend/commands/analyze.c         |  4 ++--
 src/backend/commands/lockcmds.c        |  2 +-
 src/backend/commands/publicationcmds.c |  2 +-
 src/backend/commands/tablecmds.c       | 34 +++++++++++++++++-----------------
 src/backend/commands/vacuum.c          |  4 ++--
 src/backend/executor/execMain.c        |  4 ++--
 src/backend/optimizer/prep/prepunion.c |  2 +-
 src/include/catalog/pg_inherits_fn.h   |  2 ++
 11 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index 6fc279805c..91f338f8bf 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -333,7 +333,8 @@ sepgsql_dml_privileges(List *rangeTabls, bool 
abort_on_violation)
                if (!rte->inh)
                        tableIds = list_make1_oid(rte->relid);
                else
-                       tableIds = find_all_inheritors(rte->relid, NoLock, 
NULL, NULL);
+                       tableIds = find_all_inheritors(rte->relid, NoLock, 
false,
+                                                                               
   NULL, NULL);
 
                foreach(li, tableIds)
                {
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 36f5c80b4f..c972760fe4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,7 +196,8 @@ RelationBuildPartitionDesc(Relation rel)
                return;
 
        /* Get partition oids from pg_inherits */
-       inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, 
NULL);
+       inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, 
false,
+                                                                               
NULL);
 
        /* Collect bound spec nodes in a list */
        i = 0;
diff --git a/src/backend/catalog/pg_inherits.c 
b/src/backend/catalog/pg_inherits.c
index 5292ec8058..72420f65f1 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -74,13 +74,16 @@ inhchildinfo_cmp(const void *p1, const void *p2)
  * Returns a list containing the OIDs of all relations which
  * inherit *directly* from the relation with OID 'parentrelId'.
  *
- * The specified lock type is acquired on each child relation (but not on the
- * given rel; caller should already have locked it).  If lockmode is NoLock
- * then no locks are acquired, but caller must beware of race conditions
- * against possible DROPs of child relations.
+ * The specified lock type is acquired on each child relation, (but not on the
+ * given rel; caller should already have locked it), unless
+ * lock_only_partitioned_children is specified in which case only partitioned
+ * children are locked.  If lockmode is NoLock then no locks are acquired, but
+ * caller must beware of race conditions against possible DROPs of child
+ * relations.
  */
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+                                                 bool 
lock_only_partitioned_children,
                                                  int *num_partitioned_children)
 {
        List       *list = NIL;
@@ -162,6 +165,13 @@ find_inheritance_children(Oid parentrelId, LOCKMODE 
lockmode,
        {
                inhrelid = inhchildren[i].relid;
 
+               /* If requested, skip locking non-partitioned children. */
+               if (lock_only_partitioned_children && i >= 
*num_partitioned_children)
+               {
+                       list = lappend_oid(list, inhrelid);
+                       continue;
+               }
+
                if (lockmode != NoLock)
                {
                        /* Get the lock to synchronize against concurrent drop 
*/
@@ -212,6 +222,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE 
lockmode,
  */
 List *
 find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+                                       bool lock_only_partitioned_children,
                                        List **numparents, int 
*num_partitioned_children)
 {
        /* hash table for O(1) rel_oid -> rel_numparents cell lookup */
@@ -264,6 +275,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 
                /* Get the direct children of this rel */
                currentchildren = find_inheritance_children(currentrel, 
lockmode,
+                                                                               
        lock_only_partitioned_children,
                                                                                
        &cur_num_partitioned_children);
 
                my_num_partitioned_children += cur_num_partitioned_children;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 10cc2b8314..4bd374632f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1282,8 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
         * the children.
         */
        tableOIDs =
-               find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, 
NULL,
-                                                       NULL);
+               find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, 
false,
+                                                       NULL, NULL);
 
        /*
         * Check that there's at least one descendant, else fail.  This could
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 529f244f7e..771aa11b1c 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
        List       *children;
        ListCell   *lc;
 
-       children = find_inheritance_children(reloid, NoLock, NULL);
+       children = find_inheritance_children(reloid, NoLock, false, NULL);
 
        foreach(lc, children)
        {
diff --git a/src/backend/commands/publicationcmds.c 
b/src/backend/commands/publicationcmds.c
index 64179ea3ef..4315028c66 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -516,7 +516,7 @@ OpenTableList(List *tables)
                        List       *children;
 
                        children = find_all_inheritors(myrelid, 
ShareUpdateExclusiveLock,
-                                                                               
   NULL, NULL);
+                                                                               
   false, NULL, NULL);
 
                        foreach(child, children)
                        {
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4d686a6f71..ef3869854a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1239,8 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt)
                        ListCell   *child;
                        List       *children;
 
-                       children = find_all_inheritors(myrelid, 
AccessExclusiveLock, NULL,
-                                                                               
   NULL);
+                       children = find_all_inheritors(myrelid, 
AccessExclusiveLock, false,
+                                                                               
   NULL, NULL);
 
                        foreach(child, children)
                        {
@@ -2567,7 +2567,7 @@ renameatt_internal(Oid myrelid,
                 * calls to renameatt() can determine whether there are any 
parents
                 * outside the inheritance hierarchy being processed.
                 */
-               child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
+               child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, 
false,
                                                                                
 &child_numparents, NULL);
 
                /*
@@ -2595,7 +2595,7 @@ renameatt_internal(Oid myrelid,
                 * expected_parents will only be 0 if we are not already 
recursing.
                 */
                if (expected_parents == 0 &&
-                       find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+                       find_inheritance_children(myrelid, NoLock, false, NULL) 
!= NIL)
                        ereport(ERROR,
                                        
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                         errmsg("inherited column \"%s\" must 
be renamed in child tables too",
@@ -2778,7 +2778,7 @@ rename_constraint_internal(Oid myrelid,
                                           *li;
 
                        child_oids = find_all_inheritors(myrelid, 
AccessExclusiveLock,
-                                                                               
         &child_numparents, NULL);
+                                                                               
         false, &child_numparents, NULL);
 
                        forboth(lo, child_oids, li, child_numparents)
                        {
@@ -2794,7 +2794,7 @@ rename_constraint_internal(Oid myrelid,
                else
                {
                        if (expected_parents == 0 &&
-                               find_inheritance_children(myrelid, NoLock, 
NULL) != NIL)
+                               find_inheritance_children(myrelid, NoLock, 
false, NULL) != NIL)
                                ereport(ERROR,
                                                
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                                 errmsg("inherited constraint 
\"%s\" must be renamed in child tables too",
@@ -4807,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel,
                ListCell   *child;
                List       *children;
 
-               children = find_all_inheritors(relid, lockmode, NULL, NULL);
+               children = find_all_inheritors(relid, lockmode, false, NULL, 
NULL);
 
                /*
                 * find_all_inheritors does the recursive search of the 
inheritance
@@ -5216,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
         */
        if (colDef->identity &&
                recurse &&
-               find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+               find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                 errmsg("cannot recursively add identity column 
to table that has child tables")));
@@ -5423,7 +5423,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
         * use find_all_inheritors to do it in one pass.
         */
        children = find_inheritance_children(RelationGetRelid(rel), lockmode,
-                                                                               
 NULL);
+                                                                               
 false, NULL);
 
        /*
         * If we are told not to recurse, there had better not be any child
@@ -6543,7 +6543,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char 
*colName,
         * use find_all_inheritors to do it in one pass.
         */
        children = find_inheritance_children(RelationGetRelid(rel), lockmode,
-                                                                               
 NULL);
+                                                                               
 false, NULL);
 
        if (children)
        {
@@ -6978,7 +6978,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
         * use find_all_inheritors to do it in one pass.
         */
        children = find_inheritance_children(RelationGetRelid(rel), lockmode,
-                                                                               
 NULL);
+                                                                               
 false, NULL);
 
        /*
         * Check if ONLY was specified with ALTER TABLE.  If so, allow the
@@ -7699,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, 
bool recurse,
                         */
                        if (!recursing && !con->connoinherit)
                                children = 
find_all_inheritors(RelationGetRelid(rel),
-                                                                               
           lockmode, NULL, NULL);
+                                                                               
           lockmode, false, NULL, NULL);
 
                        /*
                         * For CHECK constraints, we must ensure that we only 
mark the
@@ -8583,7 +8583,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
         */
        if (!is_no_inherit_constraint)
                children = find_inheritance_children(RelationGetRelid(rel), 
lockmode,
-                                                                               
         NULL);
+                                                                               
         false, NULL);
        else
                children = NIL;
 
@@ -8872,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue,
                ListCell   *child;
                List       *children;
 
-               children = find_all_inheritors(relid, lockmode, NULL, NULL);
+               children = find_all_inheritors(relid, lockmode, false, NULL, 
NULL);
 
                /*
                 * find_all_inheritors does the recursive search of the 
inheritance
@@ -8924,7 +8924,7 @@ ATPrepAlterColumnType(List **wqueue,
        }
        else if (!recursing &&
                         find_inheritance_children(RelationGetRelid(rel),
-                                                                          
NoLock, NULL) != NIL)
+                                                                          
NoLock, false, NULL) != NIL)
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                                 errmsg("type of inherited column \"%s\" must 
be changed in child tables too",
@@ -11036,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, 
LOCKMODE lockmode)
         * We use weakest lock we can on child's children, namely 
AccessShareLock.
         */
        children = find_all_inheritors(RelationGetRelid(child_rel),
-                                                                  
AccessShareLock, NULL, NULL);
+                                                                  
AccessShareLock, false, NULL, NULL);
 
        if (list_member_oid(children, RelationGetRelid(parent_rel)))
                ereport(ERROR,
@@ -13707,7 +13707,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
         * weaker lock now and the stronger one only when needed.
         */
        attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
-                                                                               
         AccessExclusiveLock, NULL,
+                                                                               
         AccessExclusiveLock, false, NULL,
                                                                                
         NULL);
        if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
                ereport(ERROR,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e2e5ffce42..70cd5721f3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -430,8 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
                oldcontext = MemoryContextSwitchTo(vac_context);
                if (include_parts)
                        oid_list = list_concat(oid_list,
-                                                                  
find_all_inheritors(relid, NoLock, NULL,
-                                                                               
                           NULL));
+                                                                  
find_all_inheritors(relid, NoLock, false,
+                                                                               
                           NULL, NULL));
                else
                        oid_list = lappend_oid(oid_list, relid);
                MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4424649769..63529ab1dd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3278,8 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel,
         * Get the information about the partition tree after locking all the
         * partitions.
         */
-       (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, 
NULL,
-                                                          NULL);
+       (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, 
false,
+                                                          NULL, NULL);
        RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts);
 
        /*
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index b84d6c8878..ee2e066263 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
                lockmode = AccessShareLock;
 
        /* Scan for all members of inheritance set, acquire needed locks */
-       inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL);
+       inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL);
 
        /*
         * Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/pg_inherits_fn.h 
b/src/include/catalog/pg_inherits_fn.h
index 8f371acae7..e568d11e43 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -18,8 +18,10 @@
 #include "storage/lock.h"
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+                                                 bool 
lock_only_partitioned_children,
                                                  int 
*num_partitioned_children);
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+                                       bool lock_only_partitioned_children,
                                        List **parents, int 
*num_partitioned_children);
 extern bool has_subclass(Oid relationId);
 extern bool has_superclass(Oid relationId);
-- 
2.11.0

From 49582f6707611a572b441bf692fd925e9d658781 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Wed, 26 Jul 2017 14:42:47 +0900
Subject: [PATCH 3/3] WIP: Defer opening and locking partitions to
 set_append_rel_size

---
 src/backend/catalog/partition.c        |  20 ++
 src/backend/nodes/copyfuncs.c          |  17 --
 src/backend/nodes/equalfuncs.c         |  12 --
 src/backend/nodes/outfuncs.c           |  57 +++++-
 src/backend/optimizer/path/allpaths.c  | 357 +++++++++++++++++++++++++++++++--
 src/backend/optimizer/plan/planner.c   | 106 ++++++++--
 src/backend/optimizer/prep/prepunion.c | 266 +++++++++++++++---------
 src/backend/optimizer/util/plancat.c   |  44 ++++
 src/backend/optimizer/util/relnode.c   |  81 +++++++-
 src/backend/utils/cache/lsyscache.c    |  50 +++++
 src/include/catalog/partition.h        |   4 +
 src/include/nodes/nodes.h              |   5 +-
 src/include/nodes/relation.h           |  93 +++++++--
 src/include/optimizer/plancat.h        |   1 +
 src/include/optimizer/prep.h           |   3 +
 src/include/utils/lsyscache.h          |   2 +
 src/test/regress/expected/insert.out   |   4 +-
 17 files changed, 938 insertions(+), 184 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index c972760fe4..41127a584e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1161,6 +1161,26 @@ RelationGetPartitionDispatchInfo(Relation rel,
        Assert((offset + 1) == list_length(*ptinfos));
 }
 
+/*
+ * get_partitions_for_keys
+ *             Returns the list of indexes (from pd->indexes) of the 
partitions that
+ *             will need to be scanned for the given scan keys.
+ *
+ * TODO: add the interface to pass the query scan keys and the logic to look
+ * up partitions using those keys.
+ */
+List *
+get_partitions_for_keys(PartitionDispatch pd)
+{
+       int             i;
+       List   *result = NIL;
+
+       for (i = 0; i < pd->partdesc->nparts; i++)
+               result = lappend_int(result, pd->indexes[i]);
+
+       return result;
+}
+
 /* Module-local functions */
 
 /*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 72041693df..8d17d7f52c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2249,20 +2249,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
 }
 
 /*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
-       PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
-       COPY_SCALAR_FIELD(parent_relid);
-       COPY_NODE_FIELD(child_rels);
-
-       return newnode;
-}
-
-/*
  * _copyPlaceHolderInfo
  */
 static PlaceHolderInfo *
@@ -4994,9 +4980,6 @@ copyObjectImpl(const void *from)
                case T_AppendRelInfo:
                        retval = _copyAppendRelInfo(from);
                        break;
-               case T_PartitionedChildRelInfo:
-                       retval = _copyPartitionedChildRelInfo(from);
-                       break;
                case T_PlaceHolderInfo:
                        retval = _copyPlaceHolderInfo(from);
                        break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 8d92c03633..fb248f31f3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -905,15 +905,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const 
AppendRelInfo *b)
 }
 
 static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const 
PartitionedChildRelInfo *b)
-{
-       COMPARE_SCALAR_FIELD(parent_relid);
-       COMPARE_NODE_FIELD(child_rels);
-
-       return true;
-}
-
-static bool
 _equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
 {
        COMPARE_SCALAR_FIELD(phid);
@@ -3155,9 +3146,6 @@ equal(const void *a, const void *b)
                case T_AppendRelInfo:
                        retval = _equalAppendRelInfo(a, b);
                        break;
-               case T_PartitionedChildRelInfo:
-                       retval = _equalPartitionedChildRelInfo(a, b);
-                       break;
                case T_PlaceHolderInfo:
                        retval = _equalPlaceHolderInfo(a, b);
                        break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 5ce3c7c599..1c7caca013 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2211,7 +2211,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
        WRITE_NODE_FIELD(full_join_clauses);
        WRITE_NODE_FIELD(join_info_list);
        WRITE_NODE_FIELD(append_rel_list);
-       WRITE_NODE_FIELD(pcinfo_list);
+       WRITE_NODE_FIELD(prinfo_list);
        WRITE_NODE_FIELD(rowMarks);
        WRITE_NODE_FIELD(placeholder_list);
        WRITE_NODE_FIELD(fkey_list);
@@ -2285,6 +2285,12 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
        WRITE_NODE_FIELD(joininfo);
        WRITE_BOOL_FIELD(has_eclass_joins);
        WRITE_BITMAPSET_FIELD(top_parent_relids);
+       WRITE_INT_FIELD(num_parted);
+       /* don't bother printing partition_infos */
+       WRITE_INT_FIELD(num_leaf_parts);
+       /* don't bother printing leaf_part_infos */
+       WRITE_NODE_FIELD(live_partition_painfos);
+       WRITE_UINT_FIELD(root_parent_relid);
 }
 
 static void
@@ -2510,12 +2516,42 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo 
*node)
 }
 
 static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo 
*node)
+_outPartitionInfo(StringInfo str, const PartitionInfo *node)
 {
-       WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
+       WRITE_NODE_TYPE("PARTITIONINFO");
+
+       WRITE_UINT_FIELD(relid);
+       /* Don't bother writing out the PartitionDispatch object */
+}
+
+static void
+_outLeafPartitionInfo(StringInfo str, const LeafPartitionInfo *node)
+{
+       WRITE_NODE_TYPE("LEAFPARTITIONINFO");
+
+       WRITE_OID_FIELD(reloid);
+       WRITE_UINT_FIELD(relid);
+}
+
+static void
+_outPartitionAppendInfo(StringInfo str, const PartitionAppendInfo *node)
+{
+       WRITE_NODE_TYPE("PARTITIONAPPENDINFO");
+
+       WRITE_UINT_FIELD(parent_relid);
+       WRITE_NODE_FIELD(live_partition_relids);
+}
+
+static void
+_outPartitionRootInfo(StringInfo str, const PartitionRootInfo *node)
+{
+       WRITE_NODE_TYPE("PARTITIONROOTINFO");
 
        WRITE_UINT_FIELD(parent_relid);
-       WRITE_NODE_FIELD(child_rels);
+       WRITE_NODE_FIELD(partition_infos);
+       WRITE_NODE_FIELD(partitioned_relids);
+       WRITE_NODE_FIELD(leaf_part_infos);
+       WRITE_NODE_FIELD(orig_leaf_part_oids);
 }
 
 static void
@@ -4043,8 +4079,17 @@ outNode(StringInfo str, const void *obj)
                        case T_AppendRelInfo:
                                _outAppendRelInfo(str, obj);
                                break;
-                       case T_PartitionedChildRelInfo:
-                               _outPartitionedChildRelInfo(str, obj);
+                       case T_PartitionInfo:
+                               _outPartitionInfo(str, obj);
+                               break;
+                       case T_LeafPartitionInfo:
+                               _outLeafPartitionInfo(str, obj);
+                               break;
+                       case T_PartitionAppendInfo:
+                               _outPartitionAppendInfo(str, obj);
+                               break;
+                       case T_PartitionRootInfo:
+                               _outPartitionRootInfo(str, obj);
                                break;
                        case T_PlaceHolderInfo:
                                _outPlaceHolderInfo(str, obj);
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..c9c0b85cd9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
 
 #include "access/sysattr.h"
 #include "access/tsmapi.h"
+#include "catalog/partition.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -43,6 +44,8 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
 
 
@@ -334,7 +337,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
                 */
                set_dummy_rel_pathlist(rel);
        }
-       else if (rte->inh)
+       else if (rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE)
        {
                /* It's an "append relation", process accordingly */
                set_append_rel_size(root, rel, rti, rte);
@@ -425,7 +428,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
        {
                /* We already proved the relation empty, so nothing more to do 
*/
        }
-       else if (rte->inh)
+       else if (rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE)
        {
                /* It's an "append relation", process accordingly */
                set_append_rel_pathlist(root, rel, rti, rte);
@@ -845,6 +848,166 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, 
RangeTblEntry *rte)
 }
 
 /*
+ * get_partitions_recurse
+ *             Find partitions of the partitioned table described in partinfo,
+ *             recursing for those partitions that are themselves partitioned 
tables
+ *
+ * rootrel is the root of the partition tree of which this table is a part.
+ * We create a PartitionAppendInfo for this partitioned table and append it to
+ * rootrel->live_partition_painfos.
+ *
+ * List of the leaf partitions of this table will be returned.
+ */
+static List *
+get_rel_partitions_recurse(RelOptInfo *rootrel,
+                                                  PartitionInfo *partinfo,
+                                                  PartitionInfo 
**all_partinfos,
+                                                  LeafPartitionInfo 
**leaf_part_infos)
+{
+       PartitionAppendInfo *painfo;
+       List   *indexes;
+       List   *result = NIL,
+                  *my_live_partitions = NIL;
+       ListCell *l;
+
+       /*
+        * Create a PartitionAppendInfo to map this table to the child tables
+        * that will be its Append children.
+        */
+       painfo = makeNode(PartitionAppendInfo);
+       painfo->parent_relid = partinfo->relid;
+
+       /* They will all be under the root table's Append node. */
+       rootrel->live_partition_painfos = 
lappend(rootrel->live_partition_painfos,
+                                                                               
          painfo);
+
+       /*
+        * TODO: collect the keys by looking at the clauses in
+        * rootrel->baserestrictinfo considering this table's partition keys.
+        */
+
+       /* Ask partition.c which partitions it thinks match the keys. */
+       indexes = get_partitions_for_keys(partinfo->pd);
+
+       /* Collect leaf partitions in the result list and recurse for others. */
+       foreach(l, indexes)
+       {
+               int             index = lfirst_int(l);
+
+               if (index >= 0)
+               {
+                       LeafPartitionInfo *lpinfo = leaf_part_infos[index];
+
+                       result = lappend_oid(result, lpinfo->reloid);
+                       my_live_partitions = lappend_int(my_live_partitions,
+                                                                               
         lpinfo->relid);
+               }
+               else
+               {
+                       PartitionInfo *recurse_partinfo = all_partinfos[-index];
+                       List              *my_leaf_partitions;
+
+                       my_live_partitions = lappend_int(my_live_partitions,
+                                                                               
         recurse_partinfo->relid);
+                       my_leaf_partitions = get_rel_partitions_recurse(rootrel,
+                                                                               
                                        recurse_partinfo,
+                                                                               
                                        all_partinfos,
+                                                                               
                                        leaf_part_infos);
+                       result = list_concat(result, my_leaf_partitions);
+               }
+       }
+
+       painfo->live_partition_relids = my_live_partitions;
+
+       return result;
+}
+
+/*
+ * get_rel_partitions
+ *             Recursively find partitions of rel
+ */
+static List *
+get_rel_partitions(RelOptInfo *rel)
+{
+       return get_rel_partitions_recurse(rel,
+                                                                         
rel->partition_infos[0],
+                                                                         
rel->partition_infos,
+                                                                         
rel->leaf_part_infos);
+}
+
+/*
+ * find_rel_partitions
+ *             Find and lock partitions of rel relevant to this query
+ *
+ * Note that we only ever need to lock the leaf partitions, because the
+ * partitioned tables in the partition tree have already been locked.
+ */
+static void
+find_partitions_for_query(PlannerInfo *root, RelOptInfo *rel)
+{
+       List       *leaf_part_oids = NIL;
+       ListCell   *l;
+       PlanRowMark *rc = NULL;
+       int             lockmode;
+       int             num_leaf_parts,
+                       i;
+       Oid        *leaf_part_oids_array;
+       PartitionRootInfo *prinfo = NULL;
+
+       /* Find partitions. */
+       Assert(rel->partition_infos != NULL);
+       leaf_part_oids = get_rel_partitions(rel);
+
+       /* Convert the list to an array and sort for binary searching later. */
+       num_leaf_parts = list_length(leaf_part_oids);
+       leaf_part_oids_array = (Oid *) palloc(num_leaf_parts * sizeof(Oid));
+       i = 0;
+       foreach(l, leaf_part_oids)
+       {
+               leaf_part_oids_array[i++] = lfirst_oid(l);
+       }
+       qsort(leaf_part_oids_array, num_leaf_parts, sizeof(Oid), oid_cmp);
+
+       /*
+        * Now lock partitions.  Note that rel cannot be a result relation or we
+        * wouldn't be here (inheritance_planner is where result relations go).
+        */
+       rc = get_plan_rowmark(root->rowMarks, rel->relid);
+       if (rc && RowMarkRequiresRowShareLock(rc->markType))
+               lockmode = RowShareLock;
+       else
+               lockmode = AccessShareLock;
+
+       /*
+        * We lock leaf partitions in the order in which find_all_inheritors
+        * found them in expand_inherited_rtentry().  Find that list by locating
+        * the PartitionRootInfo for this table.
+        */
+       foreach(l, root->prinfo_list)
+       {
+               prinfo = lfirst(l);
+
+               if (rel->relid == prinfo->parent_relid)
+                       break;
+       }
+       Assert(prinfo != NULL && rel->relid == prinfo->parent_relid);
+       foreach(l, prinfo->orig_leaf_part_oids)
+       {
+               Oid             relid = lfirst_oid(l);
+               Oid        *test;
+
+               /* Will this leaf partition be scanned? */
+               test = (Oid *) bsearch(&relid,
+                                                          leaf_part_oids_array,
+                                                          num_leaf_parts,
+                                                          sizeof(Oid), 
oid_cmp);
+               /* Yep, so lock. */
+               if (test != NULL)
+                       LockRelationOid(relid, lockmode);
+       }
+}
+
+/*
  * set_append_rel_size
  *       Set size estimates for a simple "append relation"
  *
@@ -866,6 +1029,134 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
        double     *parent_attrsizes;
        int                     nattrs;
        ListCell   *l;
+       List       *rel_appinfos = NIL;
+
+       /*
+        * Collect a list child AppendRelInfo's, which in the non-partitioned
+        * case will be found in root->append_rel_list.  In the partitioned
+        * table's case, we didn't build any AppendRelInfo's yet.  We will
+        * do the same after figuring out which of the table's child tables
+        * (aka partitions) will need to be scanned for this query.
+        */
+       if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+       {
+               foreach(l, root->append_rel_list)
+               {
+                       AppendRelInfo *appinfo = lfirst(l);
+
+                       /* append_rel_list contains all append rels; ignore 
others */
+                       if (appinfo->parent_relid == parentRTindex)
+                               rel_appinfos = lappend(rel_appinfos, appinfo);
+               }
+       }
+       else
+       {
+               List       *live_partitions;
+               Relation        parent;
+               List       *parent_vars;
+               RelOptInfo *rootrel;
+
+               /*
+                * If this is a partitioned table root, we will determine all 
the
+                * partitions in this partition tree that we need to scan for 
this
+                * query.  Among those, partitions that have not yet been 
locked (viz.
+                * the leaf partitions), will be.
+                */
+               if (rel->partition_infos != NULL)
+               {
+                       PartitionAppendInfo *painfo;
+
+                       rootrel = rel;
+                       find_partitions_for_query(root, rel);
+                       painfo = linitial(rel->live_partition_painfos);
+                       Assert(rti == painfo->parent_relid);
+                       live_partitions = painfo->live_partition_relids;
+               }
+               else
+               {
+                       /*
+                        * Just need to get hold of the PartitionAppendInfo via 
the root
+                        * parent's RelOptInfo.
+                        */
+                       rootrel = 
root->simple_rel_array[rel->root_parent_relid];
+                       foreach(l, rootrel->live_partition_painfos)
+                       {
+                               PartitionAppendInfo *painfo = lfirst(l);
+
+                               if (rti == painfo->parent_relid)
+                               {
+                                       live_partitions = 
painfo->live_partition_relids;
+                                       break;
+                               }
+                       }
+               }
+
+               /*
+                * Create an AppendRelInfo and a RelOptInfo for every candidate
+                * partition.
+                */
+               parent = heap_open(rte->relid, NoLock);
+               parent_vars = build_rel_vars(rte, rti);
+               foreach(l, live_partitions)
+               {
+                       Index           childRTindex = lfirst_int(l);
+                       RangeTblEntry *childrte = 
planner_rt_fetch(childRTindex, root);
+                       Relation        child;
+                       AppendRelInfo *appinfo;
+                       RelOptInfo        *childrel;
+
+                       child = heap_open(childrte->relid, NoLock);     /* 
already locked! */
+                       appinfo = makeNode(AppendRelInfo);
+                       appinfo->parent_relid = rti;
+                       appinfo->child_relid = childRTindex;
+                       appinfo->parent_reltype = parent->rd_rel->reltype;
+                       appinfo->child_reltype = child->rd_rel->reltype;
+                       appinfo->translated_vars = 
map_partition_varattnos(parent_vars,
+                                                                               
                                           rti,
+                                                                               
                                           child, parent,
+                                                                               
                                           NULL);
+                       ChangeVarNodes((Node *) appinfo->translated_vars,
+                                                  rti, childRTindex, 0);
+                       appinfo->parent_reloid = rte->relid;
+                       rel_appinfos = lappend(rel_appinfos, appinfo);
+                       root->append_rel_list = lappend(root->append_rel_list, 
appinfo);
+
+                       /*
+                        * Translate the column permissions bitmaps to the 
child's attnums
+                        * (we have to build the translated_vars list before we 
can do
+                        * this). But if this is the parent table, leave 
copyObject's
+                        * result alone.
+                        *
+                        * Note: we need to do this even though the executor 
won't run any
+                        * permissions checks on the child RTE.  The
+                        * insertedCols/updatedCols bitmaps may be examined for
+                        * trigger-firing purposes.
+                        */
+                       childrte->selectedCols = 
translate_col_privs(rte->selectedCols,
+                                                                               
                        appinfo->translated_vars);
+                       childrte->insertedCols = 
translate_col_privs(rte->insertedCols,
+                                                                               
                        appinfo->translated_vars);
+                       childrte->updatedCols = 
translate_col_privs(rte->updatedCols,
+                                                                               
                        appinfo->translated_vars);
+
+                       childrel = build_simple_rel(root, childRTindex, rel);
+                       childrel->root_parent_relid = rootrel->relid;
+                       Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+
+                       /* Copy the data that create_lateral_join_info() 
created */
+                       Assert(childrel->direct_lateral_relids == NULL);
+                       childrel->direct_lateral_relids = 
rel->direct_lateral_relids;
+                       Assert(childrel->lateral_relids == NULL);
+                       childrel->lateral_relids = rel->lateral_relids;
+                       Assert(childrel->lateral_referencers == NULL);
+                       childrel->lateral_referencers = 
rel->lateral_referencers;
+
+                       root->total_table_pages += childrel->pages;
+
+                       heap_close(child, NoLock);
+               }
+               heap_close(parent, NoLock);
+       }
 
        Assert(IS_SIMPLE_REL(rel));
 
@@ -889,7 +1180,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
        nattrs = rel->max_attr - rel->min_attr + 1;
        parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
 
-       foreach(l, root->append_rel_list)
+       foreach(l, rel_appinfos)
        {
                AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
                int                     childRTindex;
@@ -902,10 +1193,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
                ListCell   *childvars;
                ListCell   *lc;
 
-               /* append_rel_list contains all append rels; ignore others */
-               if (appinfo->parent_relid != parentRTindex)
-                       continue;
-
                childRTindex = appinfo->child_relid;
                childRTE = root->simple_rte_array[childRTindex];
 
@@ -1211,24 +1498,61 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo 
*rel,
        int                     parentRTindex = rti;
        List       *live_childrels = NIL;
        ListCell   *l;
+       List       *append_rel_children = NIL;
+
+       if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+       {
+               foreach(l, root->append_rel_list)
+               {
+                       AppendRelInfo *appinfo = lfirst(l);
+
+                       /* append_rel_list contains all append rels; ignore 
others */
+                       if (appinfo->parent_relid == parentRTindex)
+                               append_rel_children = 
lappend_int(append_rel_children,
+                                                                               
                  appinfo->child_relid);
+               }
+       }
+       else
+       {
+               /* For a partitioned table, first find its PartitionAppendInfo 
*/
+               if (rel->live_partition_painfos != NIL)
+               {
+                       PartitionAppendInfo *painfo;
+
+                       /* This is the root partitioned rel. */
+                       painfo = linitial(rel->live_partition_painfos);
+                       append_rel_children = painfo->live_partition_relids;
+               }
+               else
+               {
+                       RelOptInfo *rootrel;
+
+                       /* Non-root partitioned table.  Get it from the root 
rel. */
+                       rootrel = 
root->simple_rel_array[rel->root_parent_relid];
+                       foreach(l, rootrel->live_partition_painfos)
+                       {
+                               PartitionAppendInfo *painfo = lfirst(l);
+
+                               if (rti == painfo->parent_relid)
+                               {
+                                       append_rel_children = 
painfo->live_partition_relids;
+                                       break;
+                               }
+                       }
+               }
+       }
 
        /*
         * Generate access paths for each member relation, and remember the
         * non-dummy children.
         */
-       foreach(l, root->append_rel_list)
+       foreach(l, append_rel_children)
        {
-               AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
-               int                     childRTindex;
+               int                     childRTindex = lfirst_int(l);
                RangeTblEntry *childRTE;
                RelOptInfo *childrel;
 
-               /* append_rel_list contains all append rels; ignore others */
-               if (appinfo->parent_relid != parentRTindex)
-                       continue;
-
                /* Re-locate the child RTE and RelOptInfo */
-               childRTindex = appinfo->child_relid;
                childRTE = root->simple_rte_array[childRTindex];
                childrel = root->simple_rel_array[childRTindex];
 
@@ -1289,7 +1613,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
        RangeTblEntry *rte;
 
        rte = planner_rt_fetch(rel->relid, root);
-       if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+       /* Note that only a root partitioned table would have inh flag set. */
+       if (rte->relkind == RELKIND_PARTITIONED_TABLE && rte->inh)
        {
                partitioned_rels = get_partitioned_child_rels(root, rel->relid);
                /* The root partitioned table is included as a child rel */
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index fdef00ab39..09dd32de79 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -514,7 +514,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
        root->multiexpr_params = NIL;
        root->eq_classes = NIL;
        root->append_rel_list = NIL;
-       root->pcinfo_list = NIL;
+       root->prinfo_list = NIL;
        root->rowMarks = NIL;
        memset(root->upper_rels, 0, sizeof(root->upper_rels));
        memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -1050,6 +1050,93 @@ inheritance_planner(PlannerInfo *root)
        Index           rti;
        RangeTblEntry *parent_rte;
        List       *partitioned_rels = NIL;
+       List       *rel_appinfos = NIL;
+       ListCell   *l;
+
+       parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
+       if (parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+       {
+               foreach(l, root->append_rel_list)
+               {
+                       AppendRelInfo *appinfo = lfirst(l);
+
+                       /* append_rel_list contains all append rels; ignore 
others */
+                       if (appinfo->parent_relid == parentRTindex)
+                               rel_appinfos = lappend(rel_appinfos, appinfo);
+               }
+       }
+       else
+       {
+               PartitionRootInfo *prinfo = NULL;
+               Relation        parent;
+               List       *parent_vars = build_rel_vars(parent_rte, 
parentRTindex);
+
+               /* Find the PartitionedChildRelInfo for this rel */
+               foreach(l, root->prinfo_list)
+               {
+                       prinfo = lfirst(l);
+
+                       if (prinfo->parent_relid == parentRTindex)
+                               break;
+               }
+               Assert(prinfo != NULL && prinfo->parent_relid == parentRTindex);
+
+               parent = heap_open(parent_rte->relid, NoLock);
+               foreach(l, prinfo->leaf_part_infos)
+               {
+                       LeafPartitionInfo *lpinfo = lfirst(l);
+                       Index           childRTindex = lpinfo->relid;
+                       RangeTblEntry *childrte = 
planner_rt_fetch(childRTindex, root);
+                       Relation        child;
+                       AppendRelInfo *appinfo;
+
+                       if (childrte->relkind == RELKIND_PARTITIONED_TABLE)
+                               continue;
+
+                       /*
+                        * We'll need RowExclusiveLock, because just like the 
parent, each
+                        * child is a result relation.
+                        */
+                       child = heap_open(childrte->relid, RowExclusiveLock);
+                       appinfo = makeNode(AppendRelInfo);
+                       appinfo->parent_relid = parentRTindex;
+                       appinfo->child_relid = childRTindex;
+                       appinfo->parent_reltype = parent->rd_rel->reltype;
+                       appinfo->child_reltype = child->rd_rel->reltype;
+                       appinfo->translated_vars = 
map_partition_varattnos(parent_vars,
+                                                                               
                                           parentRTindex,
+                                                                               
                                           child, parent,
+                                                                               
                                           NULL);
+                       ChangeVarNodes((Node *) appinfo->translated_vars,
+                                                  parentRTindex, childRTindex, 
0);
+                       appinfo->parent_reloid = RelationGetRelid(parent);
+                       rel_appinfos = lappend(rel_appinfos, appinfo);
+                       root->append_rel_list = lappend(root->append_rel_list, 
appinfo);
+
+                       /*
+                        * Translate the column permissions bitmaps to the 
child's attnums
+                        * (we have to build the translated_vars list before we 
can do
+                        * this). But if this is the parent table, leave 
copyObject's
+                        * result alone.
+                        *
+                        * Note: we need to do this even though the executor 
won't run any
+                        * permissions checks on the child RTE.  The
+                        * insertedCols/updatedCols bitmaps may be examined for
+                        * trigger-firing purposes.
+                        */
+                       childrte->selectedCols =
+                                                               
translate_col_privs(parent_rte->selectedCols,
+                                                                               
                        appinfo->translated_vars);
+                       childrte->insertedCols =
+                                                               
translate_col_privs(parent_rte->insertedCols,
+                                                                               
                        appinfo->translated_vars);
+                       childrte->updatedCols =
+                                                               
translate_col_privs(parent_rte->updatedCols,
+                                                                               
                        appinfo->translated_vars);
+                       heap_close(child, NoLock);
+               }
+               heap_close(parent, NoLock);
+       }
 
        Assert(parse->commandType != CMD_INSERT);
 
@@ -1115,14 +1202,13 @@ inheritance_planner(PlannerInfo *root)
         * opposite in the case of non-partitioned inheritance parent as 
described
         * below.
         */
-       parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
        if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
                nominalRelation = parentRTindex;
 
        /*
         * And now we can get on with generating a plan for each child table.
         */
-       foreach(lc, root->append_rel_list)
+       foreach(lc, rel_appinfos)
        {
                AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
                PlannerInfo *subroot;
@@ -1130,10 +1216,6 @@ inheritance_planner(PlannerInfo *root)
                RelOptInfo *sub_final_rel;
                Path       *subpath;
 
-               /* append_rel_list contains all append rels; ignore others */
-               if (appinfo->parent_relid != parentRTindex)
-                       continue;
-
                /*
                 * We need a working copy of the PlannerInfo so that we can 
control
                 * propagation of information back to the main copy.
@@ -6070,7 +6152,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
  *             Returns a list of the RT indexes of the partitioned child 
relations
  *             with rti as the root parent RT index.
  *
- * Note: Only call this function on RTEs known to be partitioned tables.
+ * Note: Only call this function on RTEs known to be a root partitioned table.
  */
 List *
 get_partitioned_child_rels(PlannerInfo *root, Index rti)
@@ -6078,13 +6160,13 @@ get_partitioned_child_rels(PlannerInfo *root, Index rti)
        List       *result = NIL;
        ListCell   *l;
 
-       foreach(l, root->pcinfo_list)
+       foreach(l, root->prinfo_list)
        {
-               PartitionedChildRelInfo *pc = lfirst(l);
+               PartitionRootInfo *prinfo = lfirst(l);
 
-               if (pc->parent_relid == rti)
+               if (prinfo->parent_relid == rti)
                {
-                       result = pc->child_rels;
+                       result = prinfo->partitioned_relids;
                        break;
                }
        }
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index ee2e066263..4b4d95eb63 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,6 @@ static void make_inh_translation_list(Relation oldrelation,
                                                  Relation newrelation,
                                                  Index newvarno,
                                                  List **translated_vars);
-static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
-                                       List *translated_vars);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
                                                           
adjust_appendrel_attrs_context *context);
 static Relids adjust_child_relids(Relids relids, int nappinfos,
@@ -1352,11 +1350,19 @@ expand_inherited_tables(PlannerInfo *root)
 
 /*
  * expand_inherited_rtentry
- *             Check whether a rangetable entry represents an inheritance set.
- *             If so, add entries for all the child tables to the query's
- *             rangetable, and build AppendRelInfo nodes for all the child 
tables
- *             and add them to root->append_rel_list.  If not, clear the 
entry's
- *             "inh" flag to prevent later code from looking for 
AppendRelInfos.
+ *             Perform actions necessary for applying this query to an 
inheritance
+ *             set if the rte represents one
+ *
+ * That includes adding entries for all the child tables to the query's
+ * rangetable.  Also, if this query requires a PlanRowMark, generate the same
+ * for each child table and append them to the planner's global list
+ * (root->rowMarks).  If the inheritance set is really a partitioned table,
+ * our work here is done.  If not, we also create AppendRelInfo nodes for
+ * all the child tables and add them to root->append_rel_list.
+ *
+ * If it turns out that the rte is not (or no longer) an inheritance set,
+ * clear the entry's "inh" flag to prevent later code from looking for
+ * AppendRelInfos.
  *
  * Note that the original RTE is considered to represent the whole
  * inheritance set.  The first of the generated RTEs is an RTE for the same
@@ -1381,9 +1387,13 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
        List       *inhOIDs;
        List       *appinfos;
        ListCell   *l;
-       bool            has_child;
-       PartitionedChildRelInfo *pcinfo;
        List       *partitioned_child_rels = NIL;
+       List       *partition_infos = NIL;
+       List       *leaf_part_infos = NIL;
+       List       *orig_leaf_part_oids;
+       int                     num_partitioned_children;
+       PartitionedTableInfo *ptinfo;
+       PartitionInfo *pinfo;
 
        /* Does RT entry allow inheritance? */
        if (!rte->inh)
@@ -1408,6 +1418,11 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
         * relation named in the query.  However, for each child relation we add
         * to the query, we must obtain an appropriate lock, because this will 
be
         * the first use of those relations in the parse/rewrite/plan pipeline.
+        * For a partitioned table, we defer locking non-partitioned child 
tables
+        * to when we actually know that it will be scanned (see below that we
+        * use RelationGetPartitionDispatchInfo() to get the list of child 
tables
+        * of partitioned tables, not find_all_inheritors() which would lock the
+        * child tables.)
         *
         * If the parent relation is the query's result relation, then we need
         * RowExclusiveLock.  Otherwise, if it's accessed FOR UPDATE/SHARE, we
@@ -1425,7 +1440,8 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
                lockmode = AccessShareLock;
 
        /* Scan for all members of inheritance set, acquire needed locks */
-       inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL);
+       inhOIDs = find_all_inheritors(parentOID, lockmode, true, NULL,
+                                                                 
&num_partitioned_children);
 
        /*
         * Check that there's at least one descendant, else treat as no-child
@@ -1461,9 +1477,17 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
        {
                List   *leaf_part_oids,
                           *ptinfos;
+               int     rtable_length = list_length(parse->rtable),
+                               i;
+
+               /*
+                * Keep leaf partition OIDs around so that we can lock them in 
this
+                * order when we eventually do it.
+                */
+               orig_leaf_part_oids = list_copy_tail(inhOIDs,
+                                                                               
         num_partitioned_children + 1);
 
-               /* Discard the original list. */
-               list_free(inhOIDs);
+               /* Discard the original inhOIDs list. */
                inhOIDs = NIL;
 
                /* Request partitioning information. */
@@ -1471,14 +1495,37 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
                                                                                
 &leaf_part_oids);
 
                /*
-                * First collect the partitioned child table OIDs, which 
includes the
-                * root parent at the head.
+                * We make a PartitionInfo object for every partitioned table 
in the
+                * tree, including the root table.  We create the root table's
+                * PartitionInfo outside the loop, because we'd like to use its
+                * original RT index, whereas for the child partitioned tables, 
we'll
+                * use their to-be RT indexes.
                 */
+               ptinfo = linitial(ptinfos);
+               pinfo = makeNode(PartitionInfo);
+               pinfo->relid = rti;
+               pinfo->pd = ptinfo->pd;
+               partition_infos = list_make1(pinfo);
+
+               /* Let there remain only the child tables' 
PartitionedTableInfo's */
+               ptinfos = list_delete_first(ptinfos);
+
+               /*
+                * First collect the partitioned child table OIDs.  Note that 
the list
+                * won't contain the root table's OID because we removed its 
ptinfo
+                * from the list above.
+                */
+               i = 1;
                foreach(l, ptinfos)
                {
                        PartitionedTableInfo *ptinfo = lfirst(l);
+                       PartitionInfo *pinfo = makeNode(PartitionInfo);
 
                        inhOIDs = lappend_oid(inhOIDs, ptinfo->relid);
+                       pinfo->relid = rtable_length + i;
+                       pinfo->pd = ptinfo->pd;
+                       partition_infos = lappend(partition_infos, pinfo);
+                       i++;
                }
 
                /* Concatenate the leaf partition OIDs. */
@@ -1487,7 +1534,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
 
        /* Scan the inheritance set and expand it */
        appinfos = NIL;
-       has_child = false;
        foreach(l, inhOIDs)
        {
                Oid                     childOID = lfirst_oid(l);
@@ -1496,23 +1542,14 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
                Index           childRTindex;
                AppendRelInfo *appinfo;
 
-               /* Open rel if needed; we already have required locks */
-               if (childOID != parentOID)
-                       newrelation = heap_open(childOID, NoLock);
-               else
-                       newrelation = oldrelation;
-
                /*
                 * It is possible that the parent table has children that are 
temp
                 * tables of other backends.  We cannot safely access such 
tables
                 * (because of buffering issues), and the best thing to do 
seems to be
                 * to silently ignore them.
                 */
-               if (childOID != parentOID && 
RELATION_IS_OTHER_TEMP(newrelation))
-               {
-                       heap_close(newrelation, lockmode);
+               if (childOID != parentOID && rel_is_other_temp(childOID))
                        continue;
-               }
 
                /*
                 * Build an RTE for the child, and attach to query's rangetable 
list.
@@ -1528,7 +1565,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
                 */
                childrte = copyObject(rte);
                childrte->relid = childOID;
-               childrte->relkind = newrelation->rd_rel->relkind;
+               childrte->relkind = get_rel_relkind(childOID);
                childrte->inh = false;
                childrte->requiredPerms = 0;
                childrte->securityQuals = NIL;
@@ -1536,51 +1573,6 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
                childRTindex = list_length(parse->rtable);
 
                /*
-                * Build an AppendRelInfo for this parent and child, unless the 
child
-                * is a partitioned table.
-                */
-               if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
-               {
-                       /* Remember if we saw a real child. */
-                       if (childOID != parentOID)
-                               has_child = true;
-
-                       appinfo = makeNode(AppendRelInfo);
-                       appinfo->parent_relid = rti;
-                       appinfo->child_relid = childRTindex;
-                       appinfo->parent_reltype = oldrelation->rd_rel->reltype;
-                       appinfo->child_reltype = newrelation->rd_rel->reltype;
-                       make_inh_translation_list(oldrelation, newrelation, 
childRTindex,
-                                                                         
&appinfo->translated_vars);
-                       appinfo->parent_reloid = parentOID;
-                       appinfos = lappend(appinfos, appinfo);
-
-                       /*
-                        * Translate the column permissions bitmaps to the 
child's attnums
-                        * (we have to build the translated_vars list before we 
can do
-                        * this). But if this is the parent table, leave 
copyObject's
-                        * result alone.
-                        *
-                        * Note: we need to do this even though the executor 
won't run any
-                        * permissions checks on the child RTE.  The
-                        * insertedCols/updatedCols bitmaps may be examined for
-                        * trigger-firing purposes.
-                        */
-                       if (childOID != parentOID)
-                       {
-                               childrte->selectedCols = 
translate_col_privs(rte->selectedCols,
-                                                                               
                                         appinfo->translated_vars);
-                               childrte->insertedCols = 
translate_col_privs(rte->insertedCols,
-                                                                               
                                         appinfo->translated_vars);
-                               childrte->updatedCols = 
translate_col_privs(rte->updatedCols,
-                                                                               
                                        appinfo->translated_vars);
-                       }
-               }
-               else
-                       partitioned_child_rels = 
lappend_int(partitioned_child_rels,
-                                                                               
                 childRTindex);
-
-               /*
                 * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.
                 */
                if (oldrc)
@@ -1604,12 +1596,78 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
                         */
                        newrc->isParent = (childrte->relkind == 
RELKIND_PARTITIONED_TABLE);
 
-                       /* Include child's rowmark type in parent's 
allMarkTypes */
-                       oldrc->allMarkTypes |= newrc->allMarkTypes;
 
                        root->rowMarks = lappend(root->rowMarks, newrc);
                }
 
+               /*
+                * No need to create AppendRelInfo for partitions at this point,
+                * because we don't know yet if it will actually be scanned by 
this
+                * query.  The fact that this is a partition of the parent table
+                * will be recorded in the PartitionInfo created for the parent
+                * table.
+                */
+               if (rel_is_partition(childOID) &&
+                       childrte->relkind != RELKIND_PARTITIONED_TABLE)
+               {
+                       LeafPartitionInfo   *lpinfo = 
makeNode(LeafPartitionInfo);
+
+                       lpinfo->reloid = childOID;
+                       lpinfo->relid = childRTindex;
+                       leaf_part_infos = lappend(leaf_part_infos, lpinfo);
+                       continue;
+               }
+
+               if (childrte->relkind == RELKIND_PARTITIONED_TABLE)
+               {
+                       partitioned_child_rels = 
lappend_int(partitioned_child_rels,
+                                                                               
                 childRTindex);
+                       continue;
+               }
+
+               /*
+                * This must be a non-partitioned child table that is not a 
partition.
+                * Build an AppendRelInfo for the same to remember the 
parent-child
+                * relationship.
+                */
+
+               /* Open rel if needed, we already have required locks */
+               if (childOID != parentOID)
+                       newrelation = heap_open(childOID, NoLock);
+               else
+                       newrelation = oldrelation;
+
+               appinfo = makeNode(AppendRelInfo);
+               appinfo->parent_relid = rti;
+               appinfo->child_relid = childRTindex;
+               appinfo->parent_reltype = oldrelation->rd_rel->reltype;
+               appinfo->child_reltype = newrelation->rd_rel->reltype;
+               make_inh_translation_list(oldrelation, newrelation, 
childRTindex,
+                                                                 
&appinfo->translated_vars);
+               appinfo->parent_reloid = parentOID;
+               appinfos = lappend(appinfos, appinfo);
+
+               /*
+                * Translate the column permissions bitmaps to the child's 
attnums
+                * (we have to build the translated_vars list before we can do
+                * this). But if this is the parent table, leave copyObject's
+                * result alone.
+                *
+                * Note: we need to do this even though the executor won't run 
any
+                * permissions checks on the child RTE.  The
+                * insertedCols/updatedCols bitmaps may be examined for
+                * trigger-firing purposes.
+                */
+               if (childOID != parentOID)
+               {
+                       childrte->selectedCols = 
translate_col_privs(rte->selectedCols,
+                                                                               
                                 appinfo->translated_vars);
+                       childrte->insertedCols = 
translate_col_privs(rte->insertedCols,
+                                                                               
                                 appinfo->translated_vars);
+                       childrte->updatedCols = 
translate_col_privs(rte->updatedCols,
+                                                                               
                                appinfo->translated_vars);
+               }
+
                /* Close child relations, but keep locks */
                if (childOID != parentOID)
                        heap_close(newrelation, NoLock);
@@ -1618,35 +1676,53 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
        heap_close(oldrelation, NoLock);
 
        /*
-        * If all the children were temp tables or a partitioned parent did not
-        * have any leaf partitions, pretend it's a non-inheritance situation; 
we
-        * don't need Append node in that case.  The duplicate RTE we added for
-        * the parent table is harmless, so we don't bother to get rid of it;
-        * ditto for the useless PlanRowMark node.
+        * We keep a list of objects in root, each of which maps a partitioned
+        * parent RT index to a bunch of information about the partition tree
+        * rooted at that parent.  The information includes a list of RT indexes
+        * of partitioned tables appearing in the tree, a list of PartitionInfo
+        * objects for each such partitioned table, a list of LeafPartitionInfo
+        * objects for each leaf partition in tree, and finally a list 
containing
+        * leaf partition OIDs in an order in which find_all_inheritors() 
returned
+        * them.  The first of these is used when creating an Append or a
+        * ModifyTable path for the parent to be copied verbatim into the path
+        * (and subsequently the plan) so that it could be carried over to the
+        * executor.  That list is the only place where the executor could find
+        * partitioned child tables to lock them.
         */
-       if (!has_child)
+       if (rte->relkind == RELKIND_PARTITIONED_TABLE)
        {
-               /* Clear flag before returning */
-               rte->inh = false;
+               PartitionRootInfo *prinfo = makeNode(PartitionRootInfo);
+
+               Assert(list_length(partition_infos) >= 1);
+               prinfo->parent_relid = rti;
+               /*
+                * Be sure to include the parent's RT index, because the above 
code
+                * didn't.
+                */
+               prinfo->partitioned_relids = lcons_int(rti, 
partitioned_child_rels);
+               prinfo->partition_infos = partition_infos;
+               prinfo->leaf_part_infos = leaf_part_infos;
+               prinfo->orig_leaf_part_oids = orig_leaf_part_oids;
+
+               root->prinfo_list = lappend(root->prinfo_list, prinfo);
+
+               /*
+                * Our job here is done, because we didn't create any 
AppendRelInfos.
+                */
                return;
        }
 
        /*
-        * We keep a list of objects in root, each of which maps a partitioned
-        * parent RT index to the list of RT indexes of its partitioned child
-        * tables.  When creating an Append or a ModifyTable path for the 
parent,
-        * we copy the child RT index list verbatim to the path so that it could
-        * be carried over to the executor so that the latter could identify the
-        * partitioned child tables.
+        * If all the children were temp tables, pretend it's a non-inheritance
+        * situation; we don't need Append node in that case.  The duplicate
+        * RTE we added for the parent table is harmless, so we don't bother to
+        * get rid of it; ditto for the useless PlanRowMark node.
         */
-       if (partitioned_child_rels != NIL)
+       if (list_length(appinfos) < 2)
        {
-               pcinfo = makeNode(PartitionedChildRelInfo);
-
-               Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
-               pcinfo->parent_relid = rti;
-               pcinfo->child_rels = partitioned_child_rels;
-               root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
+               /* Clear flag before returning */
+               rte->inh = false;
+               return;
        }
 
        /* Otherwise, OK to add to root->append_rel_list */
@@ -1767,7 +1843,7 @@ make_inh_translation_list(Relation oldrelation, Relation 
newrelation,
  * query is really only going to reference the inherited columns.  Instead
  * we set the per-column bits for all inherited columns.
  */
-static Bitmapset *
+Bitmapset *
 translate_col_privs(const Bitmapset *parent_privs,
                                        List *translated_vars)
 {
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..5607a4e4e0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1577,6 +1577,50 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 }
 
 /*
+ * build_rel_vars
+ *
+ * Returns a list containing Var expressions corresponding to a relation's
+ * attributes.  Since the caller may already have the RangeTblEntry, we it
+ * pass the same instead of PlannerInfo to avoid finding it in the range
+ * table all over again.
+ */
+List *
+build_rel_vars(RangeTblEntry *rte, Index relid)
+{
+       Relation        relation;
+       AttrNumber      attrno;
+       int                     numattrs;
+       List       *result = NIL;
+
+       Assert(rte->rtekind == RTE_RELATION);
+
+       /* Assume we already have adequate lock */
+       relation = heap_open(rte->relid, NoLock);
+
+       numattrs = RelationGetNumberOfAttributes(relation);
+       for (attrno = 1; attrno <= numattrs; attrno++)
+       {
+               Form_pg_attribute att_tup = TupleDescAttr(relation->rd_att,
+                                                                               
                  attrno - 1);
+
+               if (att_tup->attisdropped)
+                       continue;
+
+               result = lappend(result,
+                                                makeVar(relid,
+                                                                attrno,
+                                                                
att_tup->atttypid,
+                                                                
att_tup->atttypmod,
+                                                                
att_tup->attcollation,
+                                                                0));
+
+       }
+
+       heap_close(relation, NoLock);
+       return result;
+}
+
+/*
  * build_index_tlist
  *
  * Build a targetlist representing the columns of the specified index.
diff --git a/src/backend/optimizer/util/relnode.c 
b/src/backend/optimizer/util/relnode.c
index 8ad0b4a669..4cc32dea8d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,7 +16,9 @@
 
 #include <limits.h>
 
+#include "catalog/pg_class.h"
 #include "miscadmin.h"
+#include "nodes/relation.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
@@ -146,6 +148,15 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
        rel->baserestrict_min_security = UINT_MAX;
        rel->joininfo = NIL;
        rel->has_eclass_joins = false;
+       /* Set in build_simple_rel if rel is root partitioned table */
+       rel->num_parted = 0;
+       rel->partition_infos = NULL;
+       rel->num_leaf_parts = 0;
+       rel->leaf_part_infos = NULL;
+       /* Set in get_rel_partitions_recurse */
+       rel->live_partition_painfos = NIL;
+       /* Set in set_append_rel_size if rel is a partition. */
+       rel->root_parent_relid = 0;
 
        /*
         * Pass top parent's relids down the inheritance hierarchy. If the 
parent
@@ -210,25 +221,73 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
                                                                                
list_length(rte->securityQuals));
 
        /*
-        * If this rel is an appendrel parent, recurse to build "other rel"
-        * RelOptInfos for its children.  They are "other rels" because they are
-        * not in the main join tree, but we will need RelOptInfos to plan 
access
-        * to them.
+        * If this rel is an appendrel parent, generate additional information
+        * based on whether the parent is a partitioned table or not.  For
+        * regular parent tables, recurse to build "other rel" RelOptInfos for 
its
+        * children.  They are "other rels" because they are not in the main 
join
+        * tree, but we will need RelOptInfos to plan access to them.  For
+        * partitioned parent tables, we do not yet create "other rel" 
RelOptInfos
+        * for the children.  Instead, we set up some informations that will be
+        * used in set_append_rel_size() to look up its partitions.
         */
        if (rte->inh)
        {
                ListCell   *l;
 
-               foreach(l, root->append_rel_list)
+               if (rte->relkind == RELKIND_PARTITIONED_TABLE)
                {
-                       AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+                       PartitionRootInfo *prinfo = NULL;
+                       LeafPartitionInfo **lpinfos;
+                       int             i;
+
+                       foreach(l, root->prinfo_list)
+                       {
+                               prinfo = lfirst(l);
+                               if (prinfo->parent_relid == relid)
+                                       break;
+                       }
+                       Assert(prinfo != NULL && prinfo->parent_relid == relid);
+
+                       rel->num_parted = list_length(prinfo->partition_infos);
+                       rel->num_leaf_parts = 
list_length(prinfo->leaf_part_infos);
+                       rel->partition_infos = (PartitionInfo **)
+                                                                               
        palloc0(rel->num_parted *
+                                                                               
                sizeof(PartitionInfo *));
+                       lpinfos = (LeafPartitionInfo **) 
palloc0(rel->num_leaf_parts *
+                                                                               
                sizeof(LeafPartitionInfo *));
+                       i = 0;
+                       foreach(l, prinfo->partition_infos)
+                       {
+                               rel->partition_infos[i++] = lfirst(l);
+                       }
+                       i = 0;
+                       foreach(l, prinfo->leaf_part_infos)
+                       {
+                               lpinfos[i++] = lfirst(l);
+                       }
+                       rel->leaf_part_infos = lpinfos;
 
-                       /* append_rel_list contains all append rels; ignore 
others */
-                       if (appinfo->parent_relid != relid)
-                               continue;
+                       /*
+                        * Don't build RelOptInfo for partitions yet; we don't 
know which
+                        * ones we'll need.  We did create RangeTblEntry's 
though, so we
+                        * have an empty slot in root->simple_rel_array that 
will be
+                        * filled eventually if the respective partition is 
chosen to be
+                        * scanned after all.
+                        */
+               }
+               else
+               {
+                       foreach(l, root->append_rel_list)
+                       {
+                               AppendRelInfo *appinfo = (AppendRelInfo *) 
lfirst(l);
+
+                               /* append_rel_list contains all append rels; 
ignore others */
+                               if (appinfo->parent_relid != relid)
+                                       continue;
 
-                       (void) build_simple_rel(root, appinfo->child_relid,
-                                                                       rel);
+                               (void) build_simple_rel(root, 
appinfo->child_relid,
+                                                                               
rel);
+                       }
                }
        }
 
diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index 82763f8013..ebbc3da985 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1817,6 +1817,28 @@ get_rel_relkind(Oid relid)
 }
 
 /*
+ * rel_is_partition
+ *
+ *             Returns the relkind associated with a given relation.
+ */
+char
+rel_is_partition(Oid relid)
+{
+       HeapTuple       tp;
+       Form_pg_class reltup;
+       bool            result;
+
+       tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+       if (!HeapTupleIsValid(tp))
+               elog(ERROR, "cache lookup failed for relation %u", relid);
+       reltup = (Form_pg_class) GETSTRUCT(tp);
+       result = reltup->relispartition;
+       ReleaseSysCache(tp);
+
+       return result;
+}
+
+/*
  * get_rel_tablespace
  *
  *             Returns the pg_tablespace OID associated with a given relation.
@@ -1865,6 +1887,34 @@ get_rel_persistence(Oid relid)
        return result;
 }
 
+/*
+ * rel_is_other_temp
+ *
+ *             Returns whether a relation is a temp table from another session
+ */
+bool
+rel_is_other_temp(Oid relid)
+{
+       HeapTuple       tp;
+       Form_pg_class reltup;
+       bool            result = false;
+
+       tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+       if (!HeapTupleIsValid(tp))
+               elog(ERROR, "cache lookup failed for relation %u", relid);
+       reltup = (Form_pg_class) GETSTRUCT(tp);
+
+       if (reltup->relpersistence == RELPERSISTENCE_TEMP &&
+               !isTempOrTempToastNamespace(reltup->relnamespace))
+       {
+               result = true;
+       }
+
+       ReleaseSysCache(tp);
+
+       return result;
+}
+
 
 /*                             ---------- TRANSFORM CACHE ----------           
                                 */
 
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 7b53baf847..b5dcb22688 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -16,6 +16,7 @@
 #include "fmgr.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
+#include "nodes/relation.h"
 #include "parser/parse_node.h"
 #include "utils/rel.h"
 
@@ -87,4 +88,7 @@ extern int get_partition_for_tuple(PartitionTupleRoutingInfo 
**ptrinfos,
                                                EState *estate,
                                                PartitionTupleRoutingInfo 
**failed_at,
                                                TupleTableSlot **failed_slot);
+
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(PartitionDispatch pd);
 #endif                                                 /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..e957615ac6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,10 @@ typedef enum NodeTag
        T_PlaceHolderVar,
        T_SpecialJoinInfo,
        T_AppendRelInfo,
-       T_PartitionedChildRelInfo,
+       T_PartitionInfo,
+       T_LeafPartitionInfo,
+       T_PartitionAppendInfo,
+       T_PartitionRootInfo,
        T_PlaceHolderInfo,
        T_MinMaxAggInfo,
        T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3ccc9d1b03..71c494a7c2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -251,7 +251,7 @@ typedef struct PlannerInfo
 
        List       *append_rel_list;    /* list of AppendRelInfos */
 
-       List       *pcinfo_list;        /* list of PartitionedChildRelInfos */
+       List       *prinfo_list;        /* list of PartitionRootInfos */
 
        List       *rowMarks;           /* list of PlanRowMarks */
 
@@ -515,6 +515,9 @@ typedef enum RelOptKind
 /* Is the given relation an "other" relation? */
 #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
 
+typedef struct PartitionInfo PartitionInfo;
+typedef struct LeafPartitionInfo LeafPartitionInfo;
+
 typedef struct RelOptInfo
 {
        NodeTag         type;
@@ -592,6 +595,23 @@ typedef struct RelOptInfo
 
        /* used by "other" relations */
        Relids          top_parent_relids;      /* Relids of topmost parents */
+
+       /* Fields set for "root" partitioned relations */
+       int             num_parted;                     /* Number of entries in 
partition_infos */
+       PartitionInfo **partition_infos;
+       int             num_leaf_parts;         /* Number of entries in 
leaf_part_infos */
+       LeafPartitionInfo **leaf_part_infos;    /* LeafPartitionInfos */
+
+       /* Fields set for partitioned relations (list of PartitionAppendInfo's) 
*/
+       List   *live_partition_painfos;
+
+       /* Fields set for partition otherrels */
+
+       /*
+        * RT index of the root partitioned table in the the partition tree of
+        * which this rel is a member.
+        */
+       Index   root_parent_relid;
 } RelOptInfo;
 
 /*
@@ -2012,24 +2032,73 @@ typedef struct AppendRelInfo
        Oid                     parent_reloid;  /* OID of parent relation */
 } AppendRelInfo;
 
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+
+/*
+ * PartitionInfo - information about partitioning of one partitioned table in
+ *                                a given partition tree
+ */
+typedef struct PartitionInfo
+{
+       NodeTag         type;
+
+       Index                           relid;          /* Ordinal position in 
the rangetable */
+       PartitionDispatch       pd;                     /* Information about 
partitions */
+} PartitionInfo;
+
+/*
+ * LeafPartitionInfo - (OID, RT index) pair for one leaf partition
+ *
+ * Created when a leaf partition's RT entry is created in
+ * expand_inherited_rtentry().
+ */
+typedef struct LeafPartitionInfo
+{
+       NodeTag         type;
+
+       Oid                     reloid;                 /* OID */
+       Index           relid;                  /* RT index */
+} LeafPartitionInfo;
+
 /*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree.  We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
+ * PartitionAppendInfo - list of child RT indexes for one partitioned table
+ *                                              in a given partition tree
+ */
+typedef struct PartitionAppendInfo
+{
+       NodeTag         type;
+
+       Index           parent_relid;
+       List       *live_partition_relids;      /* List of RT indexes */
+} PartitionAppendInfo;
+
+/*
+ * For a partitioned table, this maps its RT index to the information about
+ * the partition tree collected in expand_inherited_rtentry().
+ *
+ * That information includes a list of PartitionInfo nodes, one for each
+ * partitioned table in the partition tree, including for the table itself.
+ * Also included is a list of RT indexes of the entries for leaf partitions
+ * that are created at the same time by expand_inherited_rtentry().
+ *
+ * orig_leaf_part_oids contains the list of leaf partition OIDs as it was
+ * generated by find_all_inheritors().  We keep it around so that we can
+ * lock leaf partitions in that order when we actually do it.
  *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
+ * PartitionRootInfo's for different partitioned tables in a query are placed
+ * in root->prinfo_list.
  */
-typedef struct PartitionedChildRelInfo
+typedef struct PartitionRootInfo
 {
        NodeTag         type;
 
        Index           parent_relid;
-       List       *child_rels;
-} PartitionedChildRelInfo;
+       List       *partition_infos;
+       List       *partitioned_relids;
+       List       *leaf_part_infos;
+       List       *orig_leaf_part_oids;
+} PartitionRootInfo;
 
 /*
  * For each distinct placeholder expression generated during planning, we
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..1e18f609b1 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -39,6 +39,7 @@ extern bool relation_excluded_by_constraints(PlannerInfo 
*root,
                                                                 RelOptInfo 
*rel, RangeTblEntry *rte);
 
 extern List *build_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+extern List *build_rel_vars(RangeTblEntry *rte, Index relid);
 
 extern bool has_unique_index(RelOptInfo *rel, AttrNumber attno);
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 4be0afd566..d0af8dc7bc 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -16,6 +16,7 @@
 
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
+#include "utils/rel.h"
 
 
 /*
@@ -51,6 +52,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index 
rtindex);
 extern RelOptInfo *plan_set_operations(PlannerInfo *root);
 
 extern void expand_inherited_tables(PlannerInfo *root);
+extern Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
+                                       List *translated_vars);
 
 extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                                           int nappinfos, AppendRelInfo 
**appinfos);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 07208b56ce..b5b615a6fa 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -126,8 +126,10 @@ extern char *get_rel_name(Oid relid);
 extern Oid     get_rel_namespace(Oid relid);
 extern Oid     get_rel_type_id(Oid relid);
 extern char get_rel_relkind(Oid relid);
+extern bool rel_is_partition(Oid relid);
 extern Oid     get_rel_tablespace(Oid relid);
 extern char get_rel_persistence(Oid relid);
+extern bool rel_is_other_temp(Oid relid);
 extern Oid     get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
 extern Oid     get_transform_tosql(Oid typid, Oid langid, List *trftypes);
 extern bool get_typisdefined(Oid typid);
diff --git a/src/test/regress/expected/insert.out 
b/src/test/regress/expected/insert.out
index a2d9469592..e159d62b66 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -278,12 +278,12 @@ select tableoid::regclass, * from list_parted;
 -------------+----+----
  part_aa_bb  | aA |   
  part_cc_dd  | cC |  1
- part_null   |    |  0
- part_null   |    |  1
  part_ee_ff1 | ff |  1
  part_ee_ff1 | EE |  1
  part_ee_ff2 | ff | 11
  part_ee_ff2 | EE | 10
+ part_null   |    |  0
+ part_null   |    |  1
 (8 rows)
 
 -- some more tests to exercise tuple-routing with multi-level partitioning
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to