On 2017/09/04 10:10, Amit Langote wrote:
> On 2017/09/02 2:52, Robert Haas wrote:
>> It strikes me that this patch set is doing two things but maybe in the
>> opposite order that I would have chosen to attack them.  First,
>> there's getting partition pruning to use something other than
>> constraint exclusion.  Second, there's deferring work that is
>> currently done at an early stage of the process until later, so that
>> we waste less effort on partitions that are ultimately going to be
>> pruned.
> 
> OK.
> 
>> Therefore, IMHO, it would be best to focus first on how we're going to
>> identify the partitions that survive pruning, and then afterwards work
>> on transposing that logic to happen before partitions are opened and
>> locked.  That way, we get some incremental benefit sooner, and also
>> unblock some other development work.
> 
> Alright, I will try to do it that way.

Attached set of patches that does things that way.  Individual patches
described below:

[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.

This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key.  We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.

[PATCH 2/5] WIP: planner-side changes for partition-pruning

This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.

Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.

Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.

[PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}

This guy modifies the partition bound comparison function so that the
caller can pass incomplete partition key tuple that is potentially a
prefix of a multi-column partition key.  Currently, the input tuple must
contain all of key->partnatts values, but that may not be true for
queries, which may not have restrictions on all the partition key columns.

[PATCH 4/5] WIP: Implement get_partitions_for_keys()

This one fills the get_partitions_for_keys stub with the actual logic that
searches the partdesc->boundinfo for partition bounds that match the
bounding keys specified by the caller.

[PATCH 5/5] Add more tests for the new partitioning-related planning

More tests.


Some TODO items still remain to be done:

* Process OR clauses to use for partition-pruning
* Process redundant clauses (such as a = 10 and a > 1) more smartly
* Other tricks that are missing
* Fix bugs
* More tests

Thanks,
Amit
From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Wed, 6 Sep 2017 09:28:14 +0900
Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

...except when the partitioned table in question is the result rel
of the query.

This allows us perform scan and join planning for each sub-tree in a
given partition tree, with each sub-tree's root partitioned table
getting its own RelOptInfo.  Previously only the root of the whole
partition tree got a RelOptInfo, along with the leaf partitions,
with each leaf partition's AppendRelInfo pointing to the former as
its parent.
---
 src/backend/optimizer/path/allpaths.c  |  34 ++++++-
 src/backend/optimizer/plan/planner.c   |   3 +-
 src/backend/optimizer/prep/prepunion.c | 166 +++++++++++++++++++++------------
 src/backend/optimizer/util/plancat.c   |   7 +-
 4 files changed, 146 insertions(+), 64 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..6c3511bd47 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
        RangeTblEntry *rte;
 
        rte = planner_rt_fetch(rel->relid, root);
+
+       /*
+        * Get the partitioned_rels list from root->pcinfo_list after
+        * confirming that rel is actually a root partitioned table.
+        */
        if (rte->relkind == RELKIND_PARTITIONED_TABLE)
        {
-               partitioned_rels = get_partitioned_child_rels(root, rel->relid);
-               /* The root partitioned table is included as a child rel */
-               Assert(list_length(partitioned_rels) >= 1);
+               int             parent_relid;
+               bool    is_root_partitioned_table = false;
+
+               /*
+                * Normally, only the root partitioned rel will be 
RELOPT_BASEREL
+                * in a given partitione tree, except when the root table itself
+                * is a child in the case of a UNION ALL query.
+                */
+               if (!IS_OTHER_REL(rel))
+                       is_root_partitioned_table = true;
+               else if (bms_get_singleton_member(rel->top_parent_relids,
+                                                                               
  &parent_relid))
+               {
+                       RelOptInfo *parent_rel;
+
+                       parent_rel = root->simple_rel_array[parent_relid];
+                       is_root_partitioned_table =
+                                               (parent_rel->rtekind != 
RTE_RELATION);
+               }
+
+               if (is_root_partitioned_table)
+               {
+                       partitioned_rels = get_partitioned_child_rels(root, 
rel->relid);
+                       /* The root partitioned table is included as a child 
rel */
+                       Assert(list_length(partitioned_rels) >= 1);
+               }
        }
 
        /*
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 966230256e..02662fad5d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6076,7 +6076,8 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
  *             Returns a list of the RT indexes of the partitioned child 
relations
  *             with rti as the root parent RT index.
  *
- * Note: Only call this function on RTEs known to be partitioned tables.
+ * Note: Only call this function on RTEs known to be "root" partitioned
+ *              tables.
  */
 List *
 get_partitioned_child_rels(PlannerInfo *root, Index rti)
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index ccf21453fd..433505948d 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -113,7 +113,8 @@ static void expand_single_inheritance_child(PlannerInfo 
*root,
                                                                Index 
parentRTindex, Relation parentrel,
                                                                PlanRowMark 
*parentrc, Relation childrel,
                                                                bool 
*has_child, List **appinfos,
-                                                               List 
**partitioned_child_rels);
+                                                               List 
**partitioned_child_rels,
+                                                               RangeTblEntry 
**newrte, Index *newRTindex);
 static void make_inh_translation_list(Relation oldrelation,
                                                  Relation newrelation,
                                                  Index newvarno,
@@ -1380,8 +1381,8 @@ expand_inherited_tables(PlannerInfo *root)
  * regular inheritance, a parent RTE must always have at least two associated
  * AppendRelInfos: one corresponding to the parent table as a simple member of
  * inheritance set and one or more corresponding to the actual children.
- * Since a partitioned table is not scanned, it might have only one associated
- * AppendRelInfo.
+ * Since a partitioned table parent is itself not scanned, it might have only
+ * one associated AppendRelInfo.
  */
 static void
 expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
@@ -1473,13 +1474,10 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
        {
                /*
                 * If this table has partitions, recursively expand them in the 
order
-                * in which they appear in the PartitionDesc.  But first, 
expand the
-                * parent itself.
+                * in which they appear in the PartitionDesc.  Also, start 
collecting
+                * the RT indexes of the partitioned tables in the partition 
tree.
                 */
-               expand_single_inheritance_child(root, rte, rti, oldrelation, 
oldrc,
-                                                                               
oldrelation,
-                                                                               
&has_child, &appinfos,
-                                                                               
&partitioned_child_rels);
+               partitioned_child_rels = list_make1_int(rti);
                expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
                                                                          
RelationGetPartitionDesc(oldrelation),
                                                                          
lockmode,
@@ -1516,10 +1514,14 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
                                continue;
                        }
 
+                       /*
+                        * Don't expect to find any partitioned tables in a 
regular
+                        * inheritance tree, so pass NULL for 
partitioned_child_rels here.
+                        */
                        expand_single_inheritance_child(root, rte, rti, 
oldrelation, oldrc,
                                                                                
        newrelation,
                                                                                
        &has_child, &appinfos,
-                                                                               
        &partitioned_child_rels);
+                                                                               
        NULL, NULL, NULL);
 
                        /* Close child relations, but keep locks */
                        if (childOID != parentOID)
@@ -1581,6 +1583,8 @@ expand_partitioned_rtentry(PlannerInfo *root, 
RangeTblEntry *parentrte,
        {
                Oid                     childOID = partdesc->oids[i];
                Relation        childrel;
+               RangeTblEntry *childrte;
+               Index           childRTindex;
 
                /* Open rel; we already have required locks */
                childrel = heap_open(childOID, NoLock);
@@ -1592,19 +1596,60 @@ expand_partitioned_rtentry(PlannerInfo *root, 
RangeTblEntry *parentrte,
                        continue;
                }
 
-               expand_single_inheritance_child(root, parentrte, parentRTindex,
+               expand_single_inheritance_child(root,
+                                                                               
parentrte, parentRTindex,
                                                                                
parentrel, parentrc, childrel,
                                                                                
has_child, appinfos,
-                                                                               
partitioned_child_rels);
+                                                                               
partitioned_child_rels,
+                                                                               
&childrte, &childRTindex);
 
                /* If this child is itself partitioned, recurse */
                if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-                       expand_partitioned_rtentry(root, parentrte, 
parentRTindex,
-                                                                               
  parentrel, parentrc,
-                                                                               
  RelationGetPartitionDesc(childrel),
-                                                                               
  lockmode,
-                                                                               
  has_child, appinfos,
-                                                                               
  partitioned_child_rels);
+               {
+                       RangeTblEntry *new_parentrte;
+                       Index           new_parentRTindex;
+                       Relation        new_parentrel;
+
+                       /*
+                        * For SELECT queries, it's desirable to perform scan 
and join
+                        * planning on the individual partition sub-trees, 
instead of
+                        * doing the same on the whole tree at once.  This 
allows to apply
+                        * techniques such as parition-pruning and/or 
partition-wise join
+                        * on the individual partition sub-trees.  For that to 
happen,
+                        * root parent of each sub-tree must get an RTE with 
inh set to
+                        * true, which must be already taken care of by
+                        * expand_single_inheritance_child().  Next, for each 
of the
+                        * children, we must record immediate parent as its 
parent in the
+                        * the child AppendRelInfo, instead of the root parent 
of the
+                        * whole tree.
+                        *
+                        * If parent is the query's result relation, 
inheritance_planner()
+                        * will expand the inheritance so as to apply the 
*whole* query to
+                        * each leaf partition, which means we cannot apply
+                        * partition-pruning and/or partition-wise join to a 
partitioned
+                        * result relation, meaning there is not much point in 
expanding
+                        * the tree hierarchically.
+                        */
+                       if (parentRTindex == root->parse->resultRelation)
+                       {
+                               new_parentrte = parentrte;
+                               new_parentRTindex = parentRTindex;
+                               new_parentrel = parentrel;
+                       }
+                       else
+                       {
+                               new_parentrte = childrte;
+                               new_parentRTindex = childRTindex;
+                               new_parentrel = childrel;
+                       }
+
+                       expand_partitioned_rtentry(root, new_parentrte, 
new_parentRTindex,
+                                                                          
new_parentrel, parentrc,
+                                                                          
RelationGetPartitionDesc(childrel),
+                                                                          
lockmode,
+                                                                          
has_child, appinfos,
+                                                                          
partitioned_child_rels);
+               }
 
                /* Close child relation, but keep locks */
                heap_close(childrel, NoLock);
@@ -1619,13 +1664,17 @@ expand_partitioned_rtentry(PlannerInfo *root, 
RangeTblEntry *parentrte,
  * anything at all.  Otherwise, we'll set "has_child" to true, build a
  * RangeTblEntry and either a PartitionedChildRelInfo or AppendRelInfo as
  * appropriate, plus maybe a PlanRowMark.
+ *
+ * The newly created RT entry and its RT index are returned in *newrte and
+ * *newRTindex, respectively.
  */
 static void
 expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
                                                                Index 
parentRTindex, Relation parentrel,
                                                                PlanRowMark 
*parentrc, Relation childrel,
                                                                bool 
*has_child, List **appinfos,
-                                                               List 
**partitioned_child_rels)
+                                                               List 
**partitioned_child_rels,
+                                                               RangeTblEntry 
**newrte, Index *newRTindex)
 {
        Query      *parse = root->parse;
        Oid                     parentOID = RelationGetRelid(parentrel);
@@ -1649,54 +1698,46 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
        childrte = copyObject(parentrte);
        childrte->relid = childOID;
        childrte->relkind = childrel->rd_rel->relkind;
-       childrte->inh = false;
+       childrte->inh = (childrte->relkind == RELKIND_PARTITIONED_TABLE);
        childrte->requiredPerms = 0;
        childrte->securityQuals = NIL;
        parse->rtable = lappend(parse->rtable, childrte);
        childRTindex = list_length(parse->rtable);
 
+       /* Build an AppendRelInfo for this parent and child. */
+
+       /* Remember if we saw a real child. */
+       if (childOID != parentOID)
+               *has_child = true;
+
+       appinfo = makeNode(AppendRelInfo);
+       appinfo->parent_relid = parentRTindex;
+       appinfo->child_relid = childRTindex;
+       appinfo->parent_reltype = parentrel->rd_rel->reltype;
+       appinfo->child_reltype = childrel->rd_rel->reltype;
+       make_inh_translation_list(parentrel, childrel, childRTindex,
+                                                         
&appinfo->translated_vars);
+       appinfo->parent_reloid = parentOID;
+       *appinfos = lappend(*appinfos, appinfo);
+
        /*
-        * Build an AppendRelInfo for this parent and child, unless the child 
is a
-        * partitioned table.
+        * Translate the column permissions bitmaps to the child's attnums (we
+        * have to build the translated_vars list before we can do this). But
+        * if this is the parent table, leave copyObject's result alone.
+        *
+        * Note: we need to do this even though the executor won't run any
+        * permissions checks on the child RTE.  The insertedCols/updatedCols
+        * bitmaps may be examined for trigger-firing purposes.
         */
-       if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
+       if (childOID != parentOID)
        {
-               /* Remember if we saw a real child. */
-               if (childOID != parentOID)
-                       *has_child = true;
-
-               appinfo = makeNode(AppendRelInfo);
-               appinfo->parent_relid = parentRTindex;
-               appinfo->child_relid = childRTindex;
-               appinfo->parent_reltype = parentrel->rd_rel->reltype;
-               appinfo->child_reltype = childrel->rd_rel->reltype;
-               make_inh_translation_list(parentrel, childrel, childRTindex,
-                                                                 
&appinfo->translated_vars);
-               appinfo->parent_reloid = parentOID;
-               *appinfos = lappend(*appinfos, appinfo);
-
-               /*
-                * Translate the column permissions bitmaps to the child's 
attnums (we
-                * have to build the translated_vars list before we can do 
this). But
-                * if this is the parent table, leave copyObject's result alone.
-                *
-                * Note: we need to do this even though the executor won't run 
any
-                * permissions checks on the child RTE.  The 
insertedCols/updatedCols
-                * bitmaps may be examined for trigger-firing purposes.
-                */
-               if (childOID != parentOID)
-               {
-                       childrte->selectedCols = 
translate_col_privs(parentrte->selectedCols,
-                                                                               
                                 appinfo->translated_vars);
-                       childrte->insertedCols = 
translate_col_privs(parentrte->insertedCols,
-                                                                               
                                 appinfo->translated_vars);
-                       childrte->updatedCols = 
translate_col_privs(parentrte->updatedCols,
-                                                                               
                                appinfo->translated_vars);
-               }
+               childrte->selectedCols = 
translate_col_privs(parentrte->selectedCols,
+                                                                               
                         appinfo->translated_vars);
+               childrte->insertedCols = 
translate_col_privs(parentrte->insertedCols,
+                                                                               
                         appinfo->translated_vars);
+               childrte->updatedCols = 
translate_col_privs(parentrte->updatedCols,
+                                                                               
                        appinfo->translated_vars);
        }
-       else
-               *partitioned_child_rels = lappend_int(*partitioned_child_rels,
-                                                                               
          childRTindex);
 
        /*
         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.
@@ -1726,6 +1767,15 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
 
                root->rowMarks = lappend(root->rowMarks, childrc);
        }
+
+       if (partitioned_child_rels &&
+               childrte->relkind == RELKIND_PARTITIONED_TABLE)
+               *partitioned_child_rels = lappend_int(*partitioned_child_rels,
+                                                                               
          childRTindex);
+       if (newrte)
+               *newrte = childrte;
+       if (newRTindex)
+               *newRTindex = childRTindex;
 }
 
 /*
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..bfc05a1af5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1402,8 +1402,11 @@ relation_excluded_by_constraints(PlannerInfo *root,
        if (predicate_refuted_by(safe_restrictions, safe_restrictions, false))
                return true;
 
-       /* Only plain relations have constraints */
-       if (rte->rtekind != RTE_RELATION || rte->inh)
+       /*
+        * Only plain relations have constraints.  In addition, there can be
+        * inheritance parent RTEs that are themselves partitions.
+        */
+       if (rte->rtekind != RTE_RELATION || (rte->inh && !IS_OTHER_REL(rel)))
                return false;
 
        /*
-- 
2.11.0

From 0f1d944c23c0f9170ffe8553ef2d22754fa3aab7 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Tue, 22 Aug 2017 17:31:42 +0900
Subject: [PATCH 2/5] WIP: planner-side changes for partition-pruning

Firstly, this adds a stub get_partitions_for_keys() in partition.c
with appropriate interface for the caller to specify bounding scan
keys, along with other information about the scan keys extracted
from the query, such as NULL-ness of the keys, inclusive-ness, etc.

More importantly, this implements the planner-side logic to extract
bounding scan keys to be passed to get_partitions_for_keys.  That is,
it will go through rel->baserestrictinfo and match individual clauses
to partition keys and construct lower bound and upper bound tuples,
which may cover only a prefix of a multi-column partition key.

A bunch of smarts are still missing when mapping the clause operands
with keys.  For example, code to match a clause is specifed as
(constant op var) doesn't exist.  Also, redundant keys are not
eliminated, for example, a combination of clauses a = 10 and a > 1
will cause the later clause a > 1 taking over and resulting in
needless scanning of partitions containing values a > 1 and a < 10.

...constraint exclusion is still used, because
get_partitions_for_keys is just a stub...
---
 src/backend/catalog/partition.c       |  42 +++++
 src/backend/optimizer/path/allpaths.c | 308 +++++++++++++++++++++++++++++-----
 src/backend/optimizer/util/plancat.c  | 120 +++++++++++++
 src/backend/optimizer/util/relnode.c  |  20 +++
 src/include/catalog/partition.h       |   8 +
 src/include/nodes/nodes.h             |   1 +
 src/include/nodes/relation.h          | 135 +++++++++++++++
 src/include/optimizer/plancat.h       |   2 +
 8 files changed, 595 insertions(+), 41 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 50162632f5..bb3009e5b3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1138,6 +1138,48 @@ RelationGetPartitionDispatchInfo(Relation rel,
        return pd;
 }
 
+/*
+ * get_partitions_for_keys
+ *             Returns the list of indexes of rel's partitions that will need 
to be
+ *             scanned given the bounding scan keys.
+ *
+ * Each value in the returned list can be used as an index into the oids array
+ * of the partition descriptor.
+ *
+ * Inputs:
+ *     keynullness contains between 0 and (key->partnatts - 1) values, each
+ *     telling what kind of NullTest has been applies to the corresponding
+ *     partition key column.  minkeys represents the lower bound on the 
partition
+ *     the key of the records that the query will return, while maxkeys
+ *     represents upper bound.  min_inclusive and max_inclusive tell whether 
the
+ *     bounds specified minkeys and maxkeys is inclusive, respectively.
+ *
+ * Other outputs:
+ *     *min_datum_index will return the index in boundinfo->datums of the first
+ *     datum that the query's bounding keys allow to be returned for the query.
+ *     Similarly, *max_datum_index.  *null_partition_chosen returns whether
+ *     the null partition will be scanned.
+ *
+ * TODO: Implement.
+ */
+List *
+get_partitions_for_keys(Relation rel,
+                                               NullTestType *keynullness,
+                                               Datum *minkeys, int n_minkeys, 
bool min_inclusive,
+                                               Datum *maxkeys, int n_maxkeys, 
bool max_inclusive,
+                                               int *min_datum_index, int 
*max_datum_index,
+                                               bool *null_partition_chosen)
+{
+       List   *result = NIL;
+       int             i;
+       PartitionDesc   partdesc = RelationGetPartitionDesc(rel);
+
+       for (i = 0; i < partdesc->nparts; i++)
+               result = lappend_int(result, i);
+
+       return result;
+}
+
 /* Module-local functions */
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 6c3511bd47..97af646242 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
 
 #include "access/sysattr.h"
 #include "access/tsmapi.h"
+#include "catalog/partition.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -845,6 +846,222 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, 
RangeTblEntry *rte)
 }
 
 /*
+ * get_rel_partitions
+ *             Return the list of partitions of rel that pass the query clauses
+ *
+ * Returned list contains the AppendInfos of the chosen partitions.
+ */
+static List *
+get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+       Relation                parent = heap_open(rte->relid, NoLock);
+       PartitionDesc   partdesc = RelationGetPartitionDesc(parent);
+       List   *indexes;
+       List   *result = NIL;
+       ListCell   *lc1,
+                          *lc2;
+       int             keyPos;
+       List   *matchedclauses[PARTITION_MAX_KEYS];
+       NullTestType keynullness[PARTITION_MAX_KEYS];
+       Datum   minkeys[PARTITION_MAX_KEYS],
+                       maxkeys[PARTITION_MAX_KEYS];
+       bool    need_next_min,
+                       need_next_max,
+                       minkey_set[PARTITION_MAX_KEYS],
+                       maxkey_set[PARTITION_MAX_KEYS],
+                       min_incl,
+                       max_incl;
+       int             n_minkeys = 0,
+                       n_maxkeys = 0,
+                       i;
+
+       /*
+        * Match individual OpExprs in the query's restriction with individual
+        * partition key columns.  There is one list per key.
+        */
+       memset(keynullness, -1, sizeof(keynullness));
+       memset(matchedclauses, 0, sizeof(matchedclauses));
+       keyPos = 0;
+       for (i = 0; i < rel->part_scheme->partnatts; i++)
+       {
+               Node   *partkey = linitial(rel->partexprs[i]);
+
+               foreach(lc2, rel->baserestrictinfo)
+               {
+                       RestrictInfo   *rinfo = lfirst(lc2);
+                       Expr               *clause = rinfo->clause;
+
+                       if (is_opclause(clause))
+                       {
+                               Node   *leftop = get_leftop(clause);
+
+                               if (IsA(leftop, RelabelType))
+                                       leftop = (Node *) ((RelabelType *) 
leftop)->arg;
+
+                               if (equal(leftop, partkey))
+                               {
+                                       matchedclauses[keyPos] = 
lappend(matchedclauses[keyPos],
+                                                                               
                         clause);
+                                       /* A strict operator implies NOT NULL 
argument. */
+                                       keynullness[keyPos] = IS_NOT_NULL;
+                               }
+                       }
+                       else if (IsA(clause, NullTest))
+                       {
+                               NullTest   *nulltest = (NullTest *) clause;
+                               Node       *arg = (Node *) nulltest->arg;
+
+                               if (equal(arg, partkey))
+                                       keynullness[keyPos] = 
nulltest->nulltesttype;
+                       }
+               }
+
+               /* Onto finding clauses matching the next partition key. */
+               keyPos++;
+       }
+
+       /*
+        * Determine the min keys and the max keys using btree semantics-based
+        * interpretation of the clauses' operators.
+        */
+
+       /*
+        * XXX - There should be a step similar to _bt_preprocess_keys() here,
+        * to eliminate any redundant scan keys for a given partition column.  
For
+        * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being
+        * more restrictive and discard a <= 5.  While doing that, we can also
+        * check to see if there exists a contradictory combination of scan keys
+        * that makes the query trivially false for all records in the table.
+        */
+       memset(minkeys, 0, sizeof(minkeys));
+       memset(maxkeys, 0, sizeof(maxkeys));
+       memset(minkey_set, false, sizeof(minkey_set));
+       memset(maxkey_set, false, sizeof(maxkey_set));
+       need_next_min = true;
+       need_next_max = true;
+       for (i = 0; i < rel->part_scheme->partnatts; i++)
+       {
+               /*
+                * If no scan key existed for the previous column, we are done.
+                */
+               if (i > n_minkeys)
+                       need_next_min = false;
+
+               if (i > n_maxkeys)
+                       need_next_max = false;
+
+               foreach(lc1, matchedclauses[i])
+               {
+                       Expr   *clause = lfirst(lc1);
+                       Const  *rightop = (Const *) get_rightop(clause);
+                       Oid             opno = ((OpExpr *) clause)->opno,
+                                       opfamily = 
rel->part_scheme->partopfamily[i];
+                       StrategyNumber  strategy;
+
+                       strategy = get_op_opfamily_strategy(opno, opfamily);
+                       switch (strategy)
+                       {
+                               case BTLessStrategyNumber:
+                               case BTLessEqualStrategyNumber:
+                                       if (need_next_max)
+                                       {
+                                               maxkeys[i] = 
rightop->constvalue;
+                                               if (!maxkey_set[i])
+                                                       n_maxkeys++;
+                                               maxkey_set[i] = true;
+                                               max_incl = (strategy == 
BTLessEqualStrategyNumber);
+                                       }
+                                       if (strategy == BTLessStrategyNumber)
+                                               need_next_max = false;
+                                       break;
+
+                               case BTGreaterStrategyNumber:
+                               case BTGreaterEqualStrategyNumber:
+                                       if (need_next_min)
+                                       {
+                                               minkeys[i] = 
rightop->constvalue;
+                                               if (!minkey_set[i])
+                                                       n_minkeys++;
+                                               minkey_set[i] = true;
+                                               min_incl = (strategy == 
BTGreaterEqualStrategyNumber);
+                                       }
+                                       if (strategy == BTGreaterStrategyNumber)
+                                               need_next_min = false;
+                                       break;
+
+                               case BTEqualStrategyNumber:
+                                       if (need_next_min)
+                                       {
+                                               minkeys[i] = 
rightop->constvalue;
+                                               if (!minkey_set[i])
+                                                       n_minkeys++;
+                                       }
+                                       minkey_set[i] = true;
+                                       min_incl = true;
+
+                                       if (need_next_max)
+                                       {
+                                               maxkeys[i] = 
rightop->constvalue;
+                                               if (!maxkey_set[i])
+                                                       n_maxkeys++;
+                                       }
+                                       maxkey_set[i] = true;
+                                       max_incl = true;
+                                       break;
+
+                               /*
+                                * This might mean '<>', but we don't have 
anything for that
+                                * case yet.  Perhaps, handle that as key < 
const OR
+                                * key > const, once we have props needed for 
handling OR
+                                * clauses.
+                                */
+                               default:
+                                       min_incl = max_incl = false;
+                                       break;
+                       }
+               }
+       }
+
+       /* Ask partition.c which partitions it thinks match the keys. */
+       indexes = get_partitions_for_keys(parent, keynullness,
+                                                                         
minkeys, n_minkeys, min_incl,
+                                                                         
maxkeys, n_maxkeys, max_incl,
+                                                                         
&rel->painfo->min_datum_idx,
+                                                                         
&rel->painfo->max_datum_idx,
+                                                                         
&rel->painfo->contains_null_partition);
+
+       if (indexes != NIL)
+       {
+#ifdef USE_ASSERT_CHECKING
+               int             first_index,
+                               last_index;
+               first_index = linitial_int(indexes);
+               last_index = llast_int(indexes);
+               Assert(first_index <= last_index ||
+                          rel->part_scheme->strategy != 
PARTITION_STRATEGY_RANGE);
+#endif
+
+               foreach(lc1, indexes)
+               {
+                       int             partidx = lfirst_int(lc1);
+                       AppendRelInfo *appinfo = rel->child_appinfos[partidx];
+#ifdef USE_ASSERT_CHECKING
+                       RangeTblEntry *rte = 
planner_rt_fetch(appinfo->child_relid, root);
+                       Assert(partdesc->oids[partidx] == rte->relid);
+#endif
+                       result = lappend(result, appinfo);
+               }
+       }
+
+       /* Remember for future users such as set_append_rel_pathlist(). */
+       rel->painfo->live_partition_appinfos = result;
+
+       heap_close(parent, NoLock);
+
+       return result;
+}
+
+/*
  * set_append_rel_size
  *       Set size estimates for a simple "append relation"
  *
@@ -859,6 +1076,7 @@ static void
 set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
                                        Index rti, RangeTblEntry *rte)
 {
+       List       *rel_appinfos = NIL;
        int                     parentRTindex = rti;
        bool            has_live_children;
        double          parent_rows;
@@ -869,6 +1087,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 
        Assert(IS_SIMPLE_REL(rel));
 
+       if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+       {
+               foreach (l, root->append_rel_list)
+               {
+                       AppendRelInfo   *appinfo = lfirst(l);
+
+                       /* append_rel_list contains all append rels; ignore 
others */
+                       if (appinfo->parent_relid == parentRTindex)
+                               rel_appinfos = lappend(rel_appinfos, appinfo);
+               }
+       }
+       else
+       {
+               rel_appinfos = get_rel_partitions(root, rel, rte);
+               Assert(rel->painfo != NULL);
+               rel->painfo->live_partitioned_rels = list_make1_int(rti);
+       }
+
        /*
         * Initialize to compute size estimates for whole append relation.
         *
@@ -889,7 +1125,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
        nattrs = rel->max_attr - rel->min_attr + 1;
        parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
 
-       foreach(l, root->append_rel_list)
+       foreach(l, rel_appinfos)
        {
                AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
                int                     childRTindex;
@@ -902,10 +1138,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
                ListCell   *childvars;
                ListCell   *lc;
 
-               /* append_rel_list contains all append rels; ignore others */
-               if (appinfo->parent_relid != parentRTindex)
-                       continue;
-
                childRTindex = appinfo->child_relid;
                childRTE = root->simple_rte_array[childRTindex];
 
@@ -1114,6 +1346,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
                has_live_children = true;
 
                /*
+                * If childrel is itself partitioned, add it and its partitioned
+                * children to the list being propagated up to the root rel.
+                */
+               if (childrel->painfo && rel->painfo)
+               {
+                       rel->painfo->live_partitioned_rels =
+                               list_concat(rel->painfo->live_partitioned_rels,
+                                          
list_copy(childrel->painfo->live_partitioned_rels));
+               }
+
+               /*
                 * If any live child is not parallel-safe, treat the whole 
appendrel
                 * as not parallel-safe.  In future we might be able to 
generate plans
                 * in which some children are farmed out to workers while 
others are
@@ -1209,14 +1452,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo 
*rel,
                                                Index rti, RangeTblEntry *rte)
 {
        int                     parentRTindex = rti;
-       List       *live_childrels = NIL;
+       List       *rel_appinfos = NIL,
+                          *live_childrels = NIL;
        ListCell   *l;
 
+       if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+       {
+               foreach (l, root->append_rel_list)
+               {
+                       AppendRelInfo   *appinfo = lfirst(l);
+
+                       /* append_rel_list contains all append rels; ignore 
others */
+                       if (appinfo->parent_relid == parentRTindex)
+                               rel_appinfos = lappend(rel_appinfos, appinfo);
+               }
+       }
+       else
+               rel_appinfos = rel->painfo->live_partition_appinfos;
+
        /*
         * Generate access paths for each member relation, and remember the
         * non-dummy children.
         */
-       foreach(l, root->append_rel_list)
+       foreach(l, rel_appinfos)
        {
                AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
                int                     childRTindex;
@@ -1289,40 +1547,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
        RangeTblEntry *rte;
 
        rte = planner_rt_fetch(rel->relid, root);
-
-       /*
-        * Get the partitioned_rels list from root->pcinfo_list after
-        * confirming that rel is actually a root partitioned table.
-        */
-       if (rte->relkind == RELKIND_PARTITIONED_TABLE)
-       {
-               int             parent_relid;
-               bool    is_root_partitioned_table = false;
-
-               /*
-                * Normally, only the root partitioned rel will be 
RELOPT_BASEREL
-                * in a given partitione tree, except when the root table itself
-                * is a child in the case of a UNION ALL query.
-                */
-               if (!IS_OTHER_REL(rel))
-                       is_root_partitioned_table = true;
-               else if (bms_get_singleton_member(rel->top_parent_relids,
-                                                                               
  &parent_relid))
-               {
-                       RelOptInfo *parent_rel;
-
-                       parent_rel = root->simple_rel_array[parent_relid];
-                       is_root_partitioned_table =
-                                               (parent_rel->rtekind != 
RTE_RELATION);
-               }
-
-               if (is_root_partitioned_table)
-               {
-                       partitioned_rels = get_partitioned_child_rels(root, 
rel->relid);
-                       /* The root partitioned table is included as a child 
rel */
-                       Assert(list_length(partitioned_rels) >= 1);
-               }
-       }
+       if (rte->relkind == RELKIND_PARTITIONED_TABLE && IS_SIMPLE_REL(rel))
+               partitioned_rels = rel->painfo->live_partitioned_rels;
 
        /*
         * For every non-dummy child, remember the cheapest path.  Also, 
identify
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index bfc05a1af5..de50b5d86a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -68,6 +68,8 @@ static List *get_relation_constraints(PlannerInfo *root,
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
                                  Relation heapRelation);
 static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
+static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+                               Relation relation);
 
 /*
  * get_relation_info -
@@ -420,6 +422,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
        /* Collect info about relation's foreign keys, if relevant */
        get_relation_foreign_keys(root, rel, relation, inhparent);
 
+       /* Collect partitioning info, if relevant. */
+       if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+               get_relation_partition_info(root, rel, relation);
+
        heap_close(relation, NoLock);
 
        /*
@@ -1805,3 +1811,117 @@ has_row_triggers(PlannerInfo *root, Index rti, CmdType 
event)
        heap_close(relation, NoLock);
        return result;
 }
+
+static void
+get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+                                                       Relation relation)
+{
+       int     i;
+       ListCell *l;
+       PartitionKey key = RelationGetPartitionKey(relation);
+       PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+       rel->part_scheme = find_partition_scheme(root, relation);
+       rel->partexprs = (List **) palloc0(key->partnatts * sizeof(List *));
+
+       l = list_head(key->partexprs);
+       for (i = 0; i < key->partnatts; i++)
+       {
+               Expr *keyCol;
+
+               if (key->partattrs[i] != 0)
+               {
+                       keyCol = (Expr *) makeVar(rel->relid,
+                                                                         
key->partattrs[i],
+                                                                         
key->parttypid[i],
+                                                                         
key->parttypmod[i],
+                                                                         
key->parttypcoll[i],
+                                                                         0);
+               }
+               else
+               {
+                       if (l == NULL)
+                               elog(ERROR, "wrong number of partition key 
expressions");
+                       keyCol = (Expr *) copyObject(lfirst(l));
+                       l = lnext(l);
+               }
+
+               rel->partexprs[i] = list_make1(keyCol);
+       }
+
+       /* Values are filled in build_simple_rel(). */
+       rel->child_appinfos = (AppendRelInfo **) palloc0(partdesc->nparts *
+                                                                               
                         sizeof(AppendRelInfo *));
+
+       /*
+        * A PartitionAppendInfo to map this table to its immediate partitions
+        * that will be scanned by this query.  At the same time, it records the
+        * table's partitioning properties reflecting any partition-pruning that
+        * might occur to satisfy the query.  Rest of the fields are set in
+        * get_rel_partitions() and set_append_rel_size().
+        */
+       rel->painfo = makeNode(PartitionAppendInfo);
+       rel->painfo->boundinfo = partdesc->boundinfo;
+}
+
+/*
+ * find_partition_scheme
+ *
+ * The function returns a canonical partition scheme which exactly matches the
+ * partitioning scheme of the given relation if one exists in the list of
+ * canonical partitioning schemes maintained in PlannerInfo. If none of the
+ * existing partitioning schemes match, the function creates a canonical
+ * partition scheme and adds it to the list.
+ *
+ * For an unpartitioned table or for a multi-level partitioned table it returns
+ * NULL. See comments in the function for more details.
+ */
+PartitionScheme
+find_partition_scheme(PlannerInfo *root, Relation relation)
+{
+       ListCell   *lc;
+       PartitionKey key = RelationGetPartitionKey(relation);
+       char            strategy = key->strategy;
+       int                     partnatts = key->partnatts;
+       PartitionScheme part_scheme = NULL;
+
+       /* Search for a matching partition scheme and return if found one. */
+       foreach(lc, root->partition_schemes)
+       {
+               part_scheme = lfirst(lc);
+
+               /* Match various partitioning attributes. */
+               if (strategy != part_scheme->strategy ||
+                       partnatts != part_scheme->partnatts ||
+                       memcmp(key->parttypid, part_scheme->parttypid,
+                                  sizeof(Oid) * partnatts) != 0 ||
+                       memcmp(key->parttypmod, part_scheme->parttypmod,
+                                  sizeof(int32) * partnatts) != 0 ||
+                       memcmp(key->partcollation, part_scheme->partcollation,
+                                  sizeof(Oid) * partnatts) != 0 ||
+                       memcmp(key->partopfamily, part_scheme->partopfamily,
+                                  sizeof(Oid) * partnatts) != 0 ||
+                       memcmp(key->partopcintype, part_scheme->partopcintype,
+                                  sizeof(Oid) * partnatts) != 0)
+                       continue;
+
+               /* Found a matching partition scheme. */
+               return part_scheme;
+       }
+
+       /* Did not find matching partition scheme. Create one. */
+       part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData));
+
+       part_scheme->strategy = strategy;
+       part_scheme->partnatts = partnatts;
+       part_scheme->parttypid = key->parttypid;
+       part_scheme->parttypmod = key->parttypmod;
+       part_scheme->partcollation = key->partcollation;
+       part_scheme->partopfamily = key->partopfamily;
+       part_scheme->partopcintype = key->partopcintype;
+
+       /* Add the partitioning scheme to PlannerInfo. */
+       root->partition_schemes = lappend(root->partition_schemes, part_scheme);
+
+       return part_scheme;
+}
diff --git a/src/backend/optimizer/util/relnode.c 
b/src/backend/optimizer/util/relnode.c
index 8ad0b4a669..390d3b4956 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
 #include <limits.h>
 
 #include "miscadmin.h"
+#include "catalog/pg_class.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
@@ -163,6 +164,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
        else
                rel->top_parent_relids = NULL;
 
+       rel->child_appinfos = NULL;
+       rel->part_scheme = NULL;
+       rel->partexprs = NULL;
+       rel->painfo = NULL;
+
        /* Check type of rtable entry */
        switch (rte->rtekind)
        {
@@ -218,7 +224,18 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
        if (rte->inh)
        {
                ListCell   *l;
+               AppendRelInfo **child_appinfos = NULL;
+               int                     i;
 
+               if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+               {
+                       Assert(rel->part_scheme != NULL);
+                       Assert(rel->child_appinfos != NULL);
+                       Assert(rel->painfo != NULL);
+                       child_appinfos = rel->child_appinfos;
+               }
+
+               i = 0;
                foreach(l, root->append_rel_list)
                {
                        AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
@@ -229,6 +246,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
 
                        (void) build_simple_rel(root, appinfo->child_relid,
                                                                        rel);
+
+                       if (child_appinfos)
+                               child_appinfos[i++] = appinfo;
                }
        }
 
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2283c675e9..fd16494909 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -99,4 +99,12 @@ extern int get_partition_for_tuple(PartitionDispatch *pd,
                                                EState *estate,
                                                PartitionDispatchData 
**failed_at,
                                                TupleTableSlot **failed_slot);
+
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(Relation rel,
+                                               NullTestType *keynullness,
+                                               Datum *minkeys, int n_minkeys, 
bool min_inclusive,
+                                               Datum *maxkeys, int n_maxkeys, 
bool max_inclusive,
+                                               int *min_datum_index, int 
*max_datum_index,
+                                               bool *null_partition_chosen);
 #endif                                                 /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..63196a1211 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
        T_SpecialJoinInfo,
        T_AppendRelInfo,
        T_PartitionedChildRelInfo,
+       T_PartitionAppendInfo,
        T_PlaceHolderInfo,
        T_MinMaxAggInfo,
        T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a39e59d8ac..2b535984a7 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -266,6 +266,8 @@ typedef struct PlannerInfo
        List       *distinct_pathkeys;  /* distinctClause pathkeys, if any */
        List       *sort_pathkeys;      /* sortClause pathkeys, if any */
 
+       List       *partition_schemes;  /* List of PartitionScheme objects. */
+
        List       *initial_rels;       /* RelOptInfos we are now trying to 
join */
 
        /* Use fetch_upper_rel() to get any particular upper rel */
@@ -326,6 +328,48 @@ typedef struct PlannerInfo
        ((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \
         rt_fetch(rti, (root)->parse->rtable))
 
+/*
+ * Partitioning scheme
+ *     Structure to hold partitioning scheme for a given relation.
+ *
+ * Multiple relations may be partitioned in the same way. The relations
+ * resulting from joining such relations may be partitioned in the same way as
+ * the joining relations. Similarly, relations derived from such relations by
+ * grouping, sorting may be partitioned in the same way as the underlying scan
+ * relations. All such relations partitioned in the same way share the
+ * partitioning scheme.
+ *
+ * PlannerInfo stores a list of distinct "canonical" partitioning schemes.
+ * RelOptInfo of a partitioned relation holds the pointer to "canonical"
+ * partitioning scheme.
+ *
+ * We store opclass declared input data types instead of partition key
+ * datatypes since those are the ones used to compare partition bounds instead
+ * of actual partition key data types. Since partition key data types and the
+ * opclass declared input data types are expected to be binary compatible (per
+ * ResolveOpClass()), both of those should have same byval and length
+ * properties.
+ *
+ * The structure caches information about partition key data type to be used
+ * while matching partition bounds. While comparing partition schemes we don't
+ * need to compare this information as it should be same when opclass declared
+ * input data types are same for two partitioned relations.
+ */
+typedef struct PartitionSchemeData
+{
+       char            strategy;               /* Partitioning strategy */
+       int16           partnatts;              /* Number of partitioning 
attributes */
+
+       /* The following arrays each have partnatts members. */
+       Oid                *parttypid;          /* Type OIDs */
+       int32      *parttypmod;         /* Typemod values */
+       Oid                *partcollation;      /* Partitioning collation */
+       Oid                *partopfamily;       /* Operator family OIDs */
+       Oid                *partopcintype;      /* Operator class-declared 
input type OIDs */
+} PartitionSchemeData;
+
+typedef struct PartitionSchemeData *PartitionScheme;
+
 
 /*----------
  * RelOptInfo
@@ -515,6 +559,9 @@ typedef enum RelOptKind
 /* Is the given relation an "other" relation? */
 #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
 
+typedef struct AppendRelInfo AppendRelInfo;
+typedef struct PartitionAppendInfo PartitionAppendInfo;
+
 typedef struct RelOptInfo
 {
        NodeTag         type;
@@ -592,6 +639,48 @@ typedef struct RelOptInfo
 
        /* used by "other" relations */
        Relids          top_parent_relids;      /* Relids of topmost parents */
+
+       /* Fields set for partitioned relations */
+
+       /*
+        * Information about the partitioning attributes, such as the number of
+        * attributes, arrays containing per-attribute type/tpymod, partitioning
+        * collation, operator family OIDs, etc.
+        */
+       PartitionScheme         part_scheme;
+
+       /*
+        * Following contains the exact identities of the individual 
partitioning
+        * attributes.  For example, if the attribute is a table's column, then
+        * it will be represented herein by a Var node for the same.  This is
+        * structured as an array of Lists with part_scheme->partnatts members,
+        * with each list containing the expression(s) corresponding to the ith
+        * partitioning attribute (0 <= i < part_schem->partnatts) of this rel.
+        * For baserels, there is just a single expression in each slot (the ith
+        * list) of the array, because it corresponds to just one table.  But 
for
+        * a joinrel, there will be as many expressions as there are tables
+        * involved in that joinrel.  We have to do it that way, because in the
+        * joinrel case, the same corresponding partitioning attribute may have
+        * different identities in different tables involved in the join; for
+        * example, a Var node's varno will differ and so might varattnos.
+        */
+       List                      **partexprs;
+
+       /* AppendRelInfos of *all* partitions of the table. */
+       AppendRelInfo **child_appinfos;
+
+       /*
+        * For a partitioned relation, the following represents the identities
+        * of its live partition (their RT indexes) and some informations about
+        * the bounds that the live partitions satisfy.
+        */
+       PartitionAppendInfo *painfo;
+
+       /*
+        * RT index of the root partitioned table in the the partition tree of
+        * which this rel is a member.
+        */
+       Index   root_parent_relid;
 } RelOptInfo;
 
 /*
@@ -2031,6 +2120,52 @@ typedef struct PartitionedChildRelInfo
        List       *child_rels;
 } PartitionedChildRelInfo;
 
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+typedef struct PartitionKeyData *PartitionKey;
+
+/*
+ * PartitionAppendInfo - Properties of partitions contained in the Append path
+ *                                              of a given partitioned table
+ */
+typedef struct PartitionAppendInfo
+{
+       NodeTag         type;
+
+       /*
+        * List of AppendRelInfos of the table's partitions that satisfy a given
+        * query.
+        */
+       List       *live_partition_appinfos;
+
+       /*
+        * RT indexes of live partitions that are partitioned tables themselves.
+        * This includes the RT index of the table itself.
+        */
+       List       *live_partitioned_rels;
+
+       /*
+        * The following simply copies the pointer to boundinfo in the table's
+        * PartitionDesc.
+        */
+       PartitionBoundInfo boundinfo;
+
+       /*
+        * Indexes in the boundinfo->datums array of the smallest and the 
largest
+        * value of the partition key that the query allows.  They are set by
+        * calling get_partitions_for_keys().
+        */
+       int             min_datum_idx;
+       int             max_datum_idx;
+
+       /*
+        * Does this Append contain the null-accepting partition, if one exists
+        * and is allowed by the query's quals.
+        */
+       bool    contains_null_partition;
+} PartitionAppendInfo;
+
 /*
  * For each distinct placeholder expression generated during planning, we
  * store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..c45db074c6 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -56,5 +56,7 @@ extern Selectivity join_selectivity(PlannerInfo *root,
                                 SpecialJoinInfo *sjinfo);
 
 extern bool has_row_triggers(PlannerInfo *root, Index rti, CmdType event);
+extern PartitionScheme find_partition_scheme(PlannerInfo *root,
+                               Relation relation);
 
 #endif                                                 /* PLANCAT_H */
-- 
2.11.0

From 1921fc38dee9bd89f42f89012dd1d57ead4dc951 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}

Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch.  It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed.  Slightly tweaking the existing interface to allow
specifying the same seems awkward.  So, instead encapsulate that
into PartitionBoundCmpArg.  Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
 src/backend/catalog/partition.c | 123 +++++++++++++++++++++++++++++-----------
 1 file changed, 90 insertions(+), 33 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index bb3009e5b3..14876f8ea3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -105,6 +105,30 @@ typedef struct PartitionRangeBound
        bool            lower;                  /* this is the lower (vs upper) 
bound */
 } PartitionRangeBound;
 
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ *                                               partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the user-defined
+ * partition bound of a given existing partition, while an instance of the
+ * following struct describes either a new partition bound being compared
+ * against existing bounds (is_bound is true in that case and either lbound
+ * or rbound is set), or a new tuple's partition key specified in datums
+ * (ndatums = number of partition key columns).
+ */
+typedef struct PartitionBoundCmpArg
+{
+       bool    is_bound;
+       union
+       {
+               PartitionListValue         *lbound;
+               PartitionRangeBound        *rbound;
+       }       bound;
+
+       Datum  *datums;
+       int             ndatums;
+} PartitionBoundCmpArg;
+
 static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
                                                           void *arg);
 static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -131,14 +155,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
                                         bool lower1, PartitionRangeBound *b2);
 static int32 partition_rbound_datum_cmp(PartitionKey key,
                                                   Datum *rb_datums, 
PartitionRangeDatumKind *rb_kind,
-                                                  Datum *tuple_datums);
+                                                  Datum *tuple_datums, int 
n_tuple_datums);
 
 static int32 partition_bound_cmp(PartitionKey key,
                                        PartitionBoundInfo boundinfo,
-                                       int offset, void *probe, bool 
probe_is_bound);
+                                       int offset, PartitionBoundCmpArg *arg);
 static int partition_bound_bsearch(PartitionKey key,
                                                PartitionBoundInfo boundinfo,
-                                               void *probe, bool 
probe_is_bound, bool *is_equal);
+                                               PartitionBoundCmpArg *arg,
+                                               bool *is_equal);
 
 /*
  * RelationBuildPartitionDesc
@@ -663,10 +688,16 @@ check_new_partition_bound(char *relname, Relation parent,
                                                {
                                                        int                     
offset;
                                                        bool            equal;
-
+                                                       PartitionListValue 
lbound;
+                                                       PartitionBoundCmpArg 
arg;
+
+                                                       memset(&lbound, 0, 
sizeof(PartitionListValue));
+                                                       lbound.value = 
val->constvalue;
+                                                       memset(&arg, 0, 
sizeof(PartitionBoundCmpArg));
+                                                       arg.is_bound = true;
+                                                       arg.bound.lbound = 
&lbound;
                                                        offset = 
partition_bound_bsearch(key, boundinfo,
-                                                                               
                                         &val->constvalue,
-                                                                               
                                         true, &equal);
+                                                                               
                                         &arg, &equal);
                                                        if (offset >= 0 && 
equal)
                                                        {
                                                                overlap = true;
@@ -717,6 +748,7 @@ check_new_partition_bound(char *relname, Relation parent,
                                        PartitionBoundInfo boundinfo = 
partdesc->boundinfo;
                                        int                     offset;
                                        bool            equal;
+                                       PartitionBoundCmpArg    arg;
 
                                        Assert(boundinfo && boundinfo->ndatums 
> 0 &&
                                                   boundinfo->strategy == 
PARTITION_STRATEGY_RANGE);
@@ -736,8 +768,11 @@ check_new_partition_bound(char *relname, Relation parent,
                                         * since the index array is initialised 
with an extra -1
                                         * at the end.
                                         */
-                                       offset = partition_bound_bsearch(key, 
boundinfo, lower,
-                                                                               
                         true, &equal);
+                                       memset(&arg, 0, 
sizeof(PartitionBoundCmpArg));
+                                       arg.is_bound = true;
+                                       arg.bound.rbound = lower;
+                                       offset = partition_bound_bsearch(key, 
boundinfo, &arg,
+                                                                               
                         &equal);
 
                                        if (boundinfo->indexes[offset + 1] < 0)
                                        {
@@ -751,9 +786,9 @@ check_new_partition_bound(char *relname, Relation parent,
                                                {
                                                        int32           cmpval;
 
+                                                       arg.bound.rbound = 
upper;
                                                        cmpval = 
partition_bound_cmp(key, boundinfo,
-                                                                               
                                 offset + 1, upper,
-                                                                               
                                 true);
+                                                                               
                                 offset + 1, &arg);
                                                        if (cmpval < 0)
                                                        {
                                                                /*
@@ -2037,9 +2072,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
                {
                        /* Else bsearch in partdesc->boundinfo */
                        bool            equal = false;
+                       PartitionBoundCmpArg    arg;
 
+                       memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+                       arg.is_bound = false;
+                       arg.datums = values;
+                       arg.ndatums = key->partnatts;
                        cur_offset = partition_bound_bsearch(key, 
partdesc->boundinfo,
-                                                                               
                 values, false, &equal);
+                                                                               
                 &arg, &equal);
                        switch (key->strategy)
                        {
                                case PARTITION_STRATEGY_LIST:
@@ -2237,12 +2277,12 @@ partition_rbound_cmp(PartitionKey key,
 static int32
 partition_rbound_datum_cmp(PartitionKey key,
                                                   Datum *rb_datums, 
PartitionRangeDatumKind *rb_kind,
-                                                  Datum *tuple_datums)
+                                                  Datum *tuple_datums, int 
n_tuple_datums)
 {
        int                     i;
        int32           cmpval = -1;
 
-       for (i = 0; i < key->partnatts; i++)
+       for (i = 0; i < n_tuple_datums; i++)
        {
                if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
                        return -1;
@@ -2264,11 +2304,11 @@ partition_rbound_datum_cmp(PartitionKey key,
  * partition_bound_cmp
  *
  * Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
  */
 static int32
 partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
-                                       int offset, void *probe, bool 
probe_is_bound)
+                                       int offset, PartitionBoundCmpArg *arg)
 {
        Datum      *bound_datums = boundinfo->datums[offset];
        int32           cmpval = -1;
@@ -2276,17 +2316,35 @@ partition_bound_cmp(PartitionKey key, 
PartitionBoundInfo boundinfo,
        switch (key->strategy)
        {
                case PARTITION_STRATEGY_LIST:
-                       cmpval = 
DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
-                                                                               
                         key->partcollation[0],
-                                                                               
                         bound_datums[0],
-                                                                               
                         *(Datum *) probe));
-                       break;
+                       {
+                               Datum   listdatum;
+
+                               if (arg->is_bound)
+                                       listdatum = arg->bound.lbound->value;
+                               else
+                               {
+                                       if (arg->ndatums >= 1)
+                                               listdatum = arg->datums[0];
+                                       /*
+                                        * If no tuple datum to compare with 
the bound, consider
+                                        * the latter to be greater.
+                                        */
+                                       else
+                                               return 1;
+                               }
+
+                               cmpval = 
DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+                                                                               
                                 key->partcollation[0],
+                                                                               
                                 bound_datums[0],
+                                                                               
                                 listdatum));
+                               break;
+                       }
 
                case PARTITION_STRATEGY_RANGE:
                        {
                                PartitionRangeDatumKind *kind = 
boundinfo->kind[offset];
 
-                               if (probe_is_bound)
+                               if (arg->is_bound)
                                {
                                        /*
                                         * We need to pass whether the existing 
bound is a lower
@@ -2297,12 +2355,13 @@ partition_bound_cmp(PartitionKey key, 
PartitionBoundInfo boundinfo,
 
                                        cmpval = partition_rbound_cmp(key,
                                                                                
                  bound_datums, kind, lower,
-                                                                               
                  (PartitionRangeBound *) probe);
+                                                                               
                  arg->bound.rbound);
                                }
                                else
                                        cmpval = partition_rbound_datum_cmp(key,
                                                                                
                                bound_datums, kind,
-                                                                               
                                (Datum *) probe);
+                                                                               
                                arg->datums,
+                                                                               
                                arg->ndatums);
                                break;
                        }
 
@@ -2316,20 +2375,19 @@ partition_bound_cmp(PartitionKey key, 
PartitionBoundInfo boundinfo,
 
 /*
  * Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
  *
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed.  We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
  *
  * *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
  */
 static int
 partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
-                                               void *probe, bool 
probe_is_bound, bool *is_equal)
+                                               PartitionBoundCmpArg *arg, bool 
*is_equal)
 {
        int                     lo,
                                hi,
@@ -2342,8 +2400,7 @@ partition_bound_bsearch(PartitionKey key, 
PartitionBoundInfo boundinfo,
                int32           cmpval;
 
                mid = (lo + hi + 1) / 2;
-               cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
-                                                                        
probe_is_bound);
+               cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
                if (cmpval <= 0)
                {
                        lo = mid;
-- 
2.11.0

From 240d2d65fbb4af5959df8e2e1ae576bf46440d4d Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 4/5] WIP: Implement get_partitions_for_keys()

Disable constraint exclusion that occurs using internal partition
constraints, so that it's apparent what the new partition-pruning
code still needs to do to able to create a plan matching the plain
the the traditional constraint exclusion based partition-pruning
would result in.
---
 src/backend/catalog/partition.c      | 193 ++++++++++++++++++++++++++++++++++-
 src/backend/optimizer/util/plancat.c |   4 +
 2 files changed, 192 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 14876f8ea3..617900f62f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1194,8 +1194,6 @@ RelationGetPartitionDispatchInfo(Relation rel,
  *     datum that the query's bounding keys allow to be returned for the query.
  *     Similarly, *max_datum_index.  *null_partition_chosen returns whether
  *     the null partition will be scanned.
- *
- * TODO: Implement.
  */
 List *
 get_partitions_for_keys(Relation rel,
@@ -1205,12 +1203,197 @@ get_partitions_for_keys(Relation rel,
                                                int *min_datum_index, int 
*max_datum_index,
                                                bool *null_partition_chosen)
 {
+       int             i,
+                       minoff,
+                       maxoff;
        List   *result = NIL;
-       int             i;
+       PartitionKey    partkey = RelationGetPartitionKey(rel);
        PartitionDesc   partdesc = RelationGetPartitionDesc(rel);
+       PartitionBoundCmpArg    arg;
+       bool    is_equal;
+       int             null_partition_idx = partdesc->boundinfo->null_index;
+
+       *null_partition_chosen = false;
+
+       /*
+        * Check if any of the scan keys are null.  If so, return the only
+        * null-accepting partition if partdesc->boundinfo says there is one.
+        */
+       for (i = 0; i < partkey->partnatts; i++)
+       {
+               if (keynullness[i] == IS_NULL)
+               {
+                       if (null_partition_idx >= 0)
+                       {
+                               *null_partition_chosen = true;
+                               result = list_make1_int(null_partition_idx);
+                       }
+                       else
+                               result = NIL;
+
+                       return result;
+               }
+       }
+
+       if (n_minkeys > 0 && partdesc->nparts > 0)
+       {
+               memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+               arg.datums = minkeys;
+               arg.ndatums = n_minkeys;
+               minoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+                                                                               
        &arg, &is_equal);
+
+               if (is_equal && arg.ndatums < partkey->partnatts)
+               {
+                       int32   cmpval;
+
+                       is_equal = false;
+
+                       do
+                       {
+                               if (min_inclusive)
+                                       minoff -= 1;
+                               else
+                                       minoff += 1;
+                               if (minoff < 0 ||
+                                       minoff >= partdesc->boundinfo->ndatums)
+                                       break;
+                               cmpval = partition_bound_cmp(partkey, 
partdesc->boundinfo,
+                                                                               
                 minoff, &arg);
+                       } while (cmpval == 0);
+               }
+
+               /* Interpret the result per partition strategy. */
+               switch (partkey->strategy)
+               {
+                       case PARTITION_STRATEGY_LIST:
+                               /*
+                                * Found, but if the query may have asked us to 
exclude it.
+                                */
+                               if (is_equal && !min_inclusive)
+                                       minoff++;
+                               break;
+
+                       case PARTITION_STRATEGY_RANGE:
+                               /*
+                                * Records returned by the query will be > 
bounds[minoff],
+                                * because min_scankey is >= bounds[minoff], 
that is, no
+                                * records of the partition at minoff will be 
returned.  Go
+                                * to the next bound.
+                                */
+                               if (minoff < partdesc->boundinfo->ndatums - 1)
+                                       minoff += 1;
+
+                               /*
+                                * Make sure to skip a gap.
+                                * Note: There are ndatums + 1 lots in the 
indexes array.
+                                */
+                               if (partdesc->boundinfo->indexes[minoff] < 0 &&
+                                       partdesc->boundinfo->indexes[minoff + 
1] >= 0)
+                                       minoff += 1;
+                               break;
+               }
+       }
+       else
+               minoff = 0;
+
+       if (n_maxkeys > 0 && partdesc->nparts > 0)
+       {
+               memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+               arg.datums = maxkeys;
+               arg.ndatums = n_maxkeys;
+               maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+                                                                               
        &arg, &is_equal);
+               if (is_equal && arg.ndatums < partkey->partnatts)
+               {
+                       int32   cmpval;
 
-       for (i = 0; i < partdesc->nparts; i++)
-               result = lappend_int(result, i);
+                       is_equal = false;
+
+                       do
+                       {
+                               if (max_inclusive)
+                                       maxoff += 1;
+                               else
+                                       maxoff -= 1;
+                               if (maxoff < 0 ||
+                                       maxoff >= partdesc->boundinfo->ndatums)
+                                       break;
+                               cmpval = partition_bound_cmp(partkey, 
partdesc->boundinfo,
+                                                                               
         maxoff, &arg);
+                       } while (cmpval == 0);
+
+                       /* Back up if went too far. */
+                       if (max_inclusive)
+                               maxoff -= 1;
+               }
+
+               /* Interpret the result per partition strategy. */
+               switch (partkey->strategy)
+               {
+                       case PARTITION_STRATEGY_LIST:
+                               /*
+                                * Found, but if the query may have asked us to 
exclude it.
+                                */
+                               if (is_equal && !max_inclusive)
+                                       maxoff--;
+                               break;
+
+                       case PARTITION_STRATEGY_RANGE:
+                               /*
+                                * Because bounds[maxoff] <= max_scankey, we 
may need to
+                                * to consider the next partition as well, in 
addition to
+                                * the partition at maxoff and earlier.
+                                */
+                               if (!is_equal || max_inclusive)
+                                       maxoff += 1;
+
+                               /* Make sure to skip a gap. */
+                               if (partdesc->boundinfo->indexes[maxoff] < 0 && 
maxoff >= 1)
+                                       maxoff -= 1;
+                               break;
+               }
+       }
+       else
+               maxoff = partdesc->boundinfo->ndatums - 1;
+
+       *min_datum_index = minoff;
+       *max_datum_index = maxoff;
+
+       switch (partkey->strategy)
+       {
+               case PARTITION_STRATEGY_LIST:
+                       for (i = minoff; i <= maxoff; i++)
+                       {
+                               int             partition_idx = 
partdesc->boundinfo->indexes[i];
+
+                               /*
+                                * Multiple values may belong to the same 
partition, so make
+                                * sure we don't add the same partition index 
again.
+                                */
+                               result = list_append_unique_int(result, 
partition_idx);
+                       }
+
+                       /* If no bounding keys exist, include the null 
partition too. */
+                       if (null_partition_idx >= 0 &&
+                               (keynullness[0] == -1 || keynullness[0] != 
IS_NOT_NULL))
+                       {
+                               *null_partition_chosen = true;
+                               result = list_append_unique_int(result, 
null_partition_idx);
+                       }
+
+                       break;
+
+               case PARTITION_STRATEGY_RANGE:
+                       for (i = minoff; i <= maxoff; i++)
+                       {
+                               int             partition_idx = 
partdesc->boundinfo->indexes[i];
+
+                               if (partition_idx >= 0)
+                                       result = lappend_int(result, 
partition_idx);
+                       }
+                       break;
+       }
 
        return result;
 }
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index de50b5d86a..ec51d89caa 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1155,7 +1155,9 @@ get_relation_constraints(PlannerInfo *root,
        Index           varno = rel->relid;
        Relation        relation;
        TupleConstr *constr;
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
        List       *pcqual;
+#endif
 
        /*
         * We assume the relation has already been safely locked.
@@ -1241,6 +1243,7 @@ get_relation_constraints(PlannerInfo *root,
                }
        }
 
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
        /* Append partition predicates, if any */
        pcqual = RelationGetPartitionQual(relation);
        if (pcqual)
@@ -1258,6 +1261,7 @@ get_relation_constraints(PlannerInfo *root,
 
                result = list_concat(result, pcqual);
        }
+#endif
 
        heap_close(relation, NoLock);
 
-- 
2.11.0

From a8808856cb8a6e4553c72b8c0418a69b2bb1aa47 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 5/5] Add more tests for the new partitioning-related planning
 code

---
 src/test/regress/expected/partition.out | 449 ++++++++++++++++++++++++++++++++
 src/test/regress/parallel_schedule      |   2 +-
 src/test/regress/serial_schedule        |   1 +
 src/test/regress/sql/partition.sql      |  80 ++++++
 4 files changed, 531 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/partition.out
 create mode 100644 src/test/regress/sql/partition.sql

diff --git a/src/test/regress/expected/partition.out 
b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..ae5f59e3a6
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,449 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+        QUERY PLAN         
+---------------------------
+ Append
+   ->  Seq Scan on lp_ad
+   ->  Seq Scan on lp_bc
+   ->  Seq Scan on lp_ef
+   ->  Seq Scan on lp_null
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Append
+   ->  Seq Scan on lp_bc
+         Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(3 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Append
+   ->  Seq Scan on lp_bc
+         Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+   ->  Seq Scan on lp_ad
+         Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a = 'a';
+            QUERY PLAN             
+-----------------------------------
+ Append
+   ->  Seq Scan on lp_ad
+         Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+           QUERY PLAN            
+---------------------------------
+ Append
+   ->  Seq Scan on lp_ad
+         Filter: (a IS NOT NULL)
+   ->  Seq Scan on lp_bc
+         Filter: (a IS NOT NULL)
+   ->  Seq Scan on lp_ef
+         Filter: (a IS NOT NULL)
+(7 rows)
+
+explain (costs off) select * from lp where a is null;
+         QUERY PLAN          
+-----------------------------
+ Append
+   ->  Seq Scan on lp_null
+         Filter: (a IS NULL)
+(3 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+       QUERY PLAN        
+-------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: (a <= 1)
+   ->  Seq Scan on rlp2
+         Filter: (a <= 1)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 1;
+       QUERY PLAN        
+-------------------------
+ Append
+   ->  Seq Scan on rlp2
+         Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint;             /* same 
as above */
+            QUERY PLAN             
+-----------------------------------
+ Append
+   ->  Seq Scan on rlp2
+         Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric;    /* no pruning */
+                  QUERY PLAN                   
+-----------------------------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: ((a)::numeric = '1'::numeric)
+   ->  Seq Scan on rlp2
+         Filter: ((a)::numeric = '1'::numeric)
+   ->  Seq Scan on rlp3abcd
+         Filter: ((a)::numeric = '1'::numeric)
+   ->  Seq Scan on rlp3efgh
+         Filter: ((a)::numeric = '1'::numeric)
+   ->  Seq Scan on rlp3nullxy
+         Filter: ((a)::numeric = '1'::numeric)
+   ->  Seq Scan on rlp4
+         Filter: ((a)::numeric = '1'::numeric)
+   ->  Seq Scan on rlp5
+         Filter: ((a)::numeric = '1'::numeric)
+(15 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+        QUERY PLAN         
+---------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: (a <= 10)
+   ->  Seq Scan on rlp2
+         Filter: (a <= 10)
+(5 rows)
+
+explain (costs off) select * from rlp where a > 10;
+          QUERY PLAN          
+------------------------------
+ Append
+   ->  Seq Scan on rlp3abcd
+         Filter: (a > 10)
+   ->  Seq Scan on rlp3efgh
+         Filter: (a > 10)
+   ->  Seq Scan on rlp3nullxy
+         Filter: (a > 10)
+   ->  Seq Scan on rlp4
+         Filter: (a > 10)
+   ->  Seq Scan on rlp5
+         Filter: (a > 10)
+(11 rows)
+
+explain (costs off) select * from rlp where a < 15;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: (a < 15)
+   ->  Seq Scan on rlp2
+         Filter: (a < 15)
+(5 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+          QUERY PLAN          
+------------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: (a <= 15)
+   ->  Seq Scan on rlp2
+         Filter: (a <= 15)
+   ->  Seq Scan on rlp3abcd
+         Filter: (a <= 15)
+   ->  Seq Scan on rlp3efgh
+         Filter: (a <= 15)
+   ->  Seq Scan on rlp3nullxy
+         Filter: (a <= 15)
+(11 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Append
+   ->  Seq Scan on rlp3abcd
+         Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+   ->  Seq Scan on rlp4
+         Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+   ->  Seq Scan on rlp5
+         Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+                 QUERY PLAN                 
+--------------------------------------------
+ Append
+   ->  Seq Scan on rlp3nullxy
+         Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+                   QUERY PLAN                   
+------------------------------------------------
+ Append
+   ->  Seq Scan on rlp3abcd
+         Filter: ((b IS NOT NULL) AND (a = 16))
+   ->  Seq Scan on rlp3efgh
+         Filter: ((b IS NOT NULL) AND (a = 16))
+   ->  Seq Scan on rlp3nullxy
+         Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null;         /* while we're 
on nulls */
+        QUERY PLAN        
+--------------------------
+ Result
+   One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a > 30;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on rlp5
+         Filter: (a > 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 30;    /* empty */
+        QUERY PLAN        
+--------------------------
+ Result
+   One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+          QUERY PLAN          
+------------------------------
+ Append
+   ->  Seq Scan on rlp1
+         Filter: (a <= 31)
+   ->  Seq Scan on rlp2
+         Filter: (a <= 31)
+   ->  Seq Scan on rlp3abcd
+         Filter: (a <= 31)
+   ->  Seq Scan on rlp3efgh
+         Filter: (a <= 31)
+   ->  Seq Scan on rlp3nullxy
+         Filter: (a <= 31)
+   ->  Seq Scan on rlp4
+         Filter: (a <= 31)
+   ->  Seq Scan on rlp5
+         Filter: (a <= 31)
+(15 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 
1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 
10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 
20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, 
maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 
10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 
20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to 
(maxvalue, 0, 0);
+explain (costs off) select * from mc3p where a = 1;
+       QUERY PLAN        
+-------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: (a = 1)
+   ->  Seq Scan on mc3p1
+         Filter: (a = 1)
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+                 QUERY PLAN                 
+--------------------------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+                 QUERY PLAN                 
+--------------------------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: ((a = 1) AND (abs(b) = 1))
+   ->  Seq Scan on mc3p1
+         Filter: ((a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+   ->  Seq Scan on mc3p1
+         Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 
35;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Append
+   ->  Seq Scan on mc3p1
+         Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+   ->  Seq Scan on mc3p2
+         Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+   ->  Seq Scan on mc3p3
+         Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+   ->  Seq Scan on mc3p4
+         Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(9 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on mc3p5
+         Filter: (a > 10)
+   ->  Seq Scan on mc3p6
+         Filter: (a > 10)
+   ->  Seq Scan on mc3p7
+         Filter: (a > 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+        QUERY PLAN         
+---------------------------
+ Append
+   ->  Seq Scan on mc3p1
+         Filter: (a >= 10)
+   ->  Seq Scan on mc3p2
+         Filter: (a >= 10)
+   ->  Seq Scan on mc3p3
+         Filter: (a >= 10)
+   ->  Seq Scan on mc3p4
+         Filter: (a >= 10)
+   ->  Seq Scan on mc3p5
+         Filter: (a >= 10)
+   ->  Seq Scan on mc3p6
+         Filter: (a >= 10)
+   ->  Seq Scan on mc3p7
+         Filter: (a >= 10)
+(15 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: (a < 10)
+   ->  Seq Scan on mc3p1
+         Filter: (a < 10)
+(5 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: ((a <= 10) AND (abs(b) < 10))
+   ->  Seq Scan on mc3p1
+         Filter: ((a <= 10) AND (abs(b) < 10))
+   ->  Seq Scan on mc3p2
+         Filter: ((a <= 10) AND (abs(b) < 10))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0;    /* 
empty */
+        QUERY PLAN        
+--------------------------
+ Result
+   One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 
100;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Append
+   ->  Seq Scan on mc3p6
+         Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on mc3p7
+         Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+        QUERY PLAN         
+---------------------------
+ Append
+   ->  Seq Scan on mc3p5
+         Filter: (a >= 20)
+   ->  Seq Scan on mc3p6
+         Filter: (a >= 20)
+   ->  Seq Scan on mc3p7
+         Filter: (a >= 20)
+(7 rows)
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on lp_ad
+         Filter: (a <> 'a'::bpchar)
+   ->  Seq Scan on lp_bc
+         Filter: (a <> 'a'::bpchar)
+   ->  Seq Scan on lp_ef
+         Filter: (a <> 'a'::bpchar)
+(7 rows)
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+               QUERY PLAN               
+----------------------------------------
+ Append
+   ->  Seq Scan on mc3p0
+         Filter: ((a > 1) AND (a = 10))
+   ->  Seq Scan on mc3p1
+         Filter: ((a > 1) AND (a = 10))
+   ->  Seq Scan on mc3p2
+         Filter: ((a > 1) AND (a = 10))
+   ->  Seq Scan on mc3p3
+         Filter: ((a > 1) AND (a = 10))
+   ->  Seq Scan on mc3p4
+         Filter: ((a > 1) AND (a = 10))
+(11 rows)
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Append
+   ->  Seq Scan on rlp3abcd
+         Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+   ->  Seq Scan on rlp3efgh
+         Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+   ->  Seq Scan on rlp3nullxy
+         Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+(7 rows)
+
+drop table lp, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule 
b/src/test/regress/parallel_schedule
index 2fd3f2b1b1..2eb81fcf41 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_aggregate create_function_3 create_cast constraints triggers 
inherit create_table_like typed_table vacuum drop_if_exists updatable_views 
rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers 
inherit create_table_like typed_table vacuum drop_if_exists updatable_views 
rolenames roleattributes create_am hash_func partition
 
 # ----------
 # sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 76b0de30a7..6611662149 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
 test: constraints
 test: triggers
 test: inherit
+test: partition
 test: create_table_like
 test: typed_table
 test: vacuum
diff --git a/src/test/regress/sql/partition.sql 
b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..423ac3726f
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,80 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint;             /* same 
as above */
+explain (costs off) select * from rlp where a = 1::numeric;    /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null;         /* while we're 
on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30;    /* empty */
+explain (costs off) select * from rlp where a <= 31;
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 
1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 
10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 
20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, 
maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 
10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 
20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to 
(maxvalue, 0, 0);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 
35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0;    /* 
empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 
100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+
+drop table lp, rlp, mc3p;
-- 
2.11.0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to