Hello,

With native partitioning landing in Postgres 10, we (Julien Rouhaud and 
myself) had the idea for the 
following simple optimisation. This simple optimisation comes from a real use 
case.

===== Proposal =====

With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition, it is 
possible to sort them according to the partition key (as is done already for 
looking up partitions).

Thus, we ca generate sorted Append plans instead of MergeAppend when sorting 
occurs on the partition key.

For example, consider the following model and query:


CREATE TABLE parent (id int) PARTITION BY RANGE (id);
CREATE TABLE child2 PARTITION OF parent FOR VALUES FROM (10) TO (20);
CREATE TABLE child1 PARTITION OF parent FOR VALUES FROM (1) TO (10);
CREATE INDEX ON child1(id);
CREATE INDEX ON child2(id);



EXPLAIN (COSTS OFF) SELECT * FROM parent ORDER BY id desc;

                          QUERY PLAN                          
--------------------------------------------------------------
 Merge Append
   Sort Key: parent.id DESC
   ->  Sort
         Sort Key: parent.id DESC
         ->  Seq Scan on parent
   ->  Index Only Scan Backward using child2_id_idx on child2
   ->  Index Only Scan Backward using child1_id_idx on child

We can guarantee that every value found in child2 will have a greater id than 
any value found in child1. Thus, we could generate a plan like this:

                          QUERY PLAN                           
---------------------------------------------------------------
 Append
   Sort Key: child2.id DESC
   ->  Index Only Scan Backward using child2_id_idx1 on child2
   ->  Index Only Scan Backward using child1_id_idx1 on child1

Skipping the merge step altogether.

This has the added benefit of providing an "early stop": if we add a limit to 
the query, we will only scan the indexes that are necessary for fetching the 
required amount of tuples. This is especially useful if we add a WHERE clause 
not covered by the index with the limit. Consider the following example, with 
a table "webvisits" partitioned by a ts colum. If we want to retrieve the 
latest 100 hits from a specific ip:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM webvisits WHERE ipaddr = 
'93.184.216.34' ORDER BY ts DESC LIMIT 10;

                                                            
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..32.32 rows=10 width=104) (actual time=0.080..0.109 rows=10 
loops=1)
   Buffers: shared hit=7
   ->  Append  (cost=0.43..401009.65 rows=125740 width=104) (actual 
time=0.078..0.105 rows=10 loops=1)
         Sort Key: webvisits_201712.ts DESC
         Buffers: shared hit=7
         ->  Index Scan Backward using webvisits_201712_ipaddr_ts_idx on 
webvisits_201712  (cost=0.43..133617.70 rows=41901 width=104) (actual 
time=0.077..0.101 rows=10 loops=1)
               Index Cond: (ipaddr = '93.184.216.34'::inet)
               Buffers: shared hit=7
         ->  Index Scan Backward using webvisits_201711_ipaddr_ts_idx on 
webvisits_201711  (cost=0.43..131514.18 rows=41243 width=104) (never executed)
               Index Cond: (ipaddr = '93.184.216.34'::inet)
         ->  Index Scan Backward using webvisits_201710_ipaddr_ts_idx on 
webvisits_201710  (cost=0.43..135801.71 rows=42587 width=104) (never executed)
             ........

We have developed a proof-of-concept implementation for this optimisation.

For sub partitioning, intermediate Append nodes can be generated.

Consider the following examples:


CREATE TABLE parentparent (c1 int, c2 int) PARTITION BY range (c1);

-- Subpartition only on second column
CREATE TABLE parent1 PARTITION OF parentparent FOR VALUES FROM (1) TO (10)
PARTITION BY range (c2);

-- Subpartition on both columns
CREATE TABLE parent2 PARTITION OF parentparent FOR VALUES FROM (10) TO (20)
PARTITION BY range (c1, c2);

CREATE TABLE child11 PARTITION OF parent1 FOR VALUES FROM (1) TO (10);
CREATE TABLE child12 PARTITION OF parent1 FOR VALUES FROM (10) TO (20);
CREATE TABLE child21 PARTITION OF parent2 FOR VALUES FROM (10, 1) TO (15, 10);
CREATE TABLE child22 PARTITION OF parent2 FOR VALUES FROM (15, 10) TO (20, 
20);

EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1;
              QUERY PLAN               
---------------------------------------
 Append
   Sort Key: parent1.c1
   ->  Sort
         Sort Key: parent1.c1
         ->  Append
               ->  Seq Scan on parent1
               ->  Seq Scan on child11
               ->  Seq Scan on child12
   ->  Append
         Sort Key: child21.c1
         ->  Sort
               Sort Key: child21.c1
               ->  Seq Scan on child21
         ->  Sort
               Sort Key: child22.c1
               ->  Seq Scan on child22


EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1, c2;
                   QUERY PLAN                   
------------------------------------------------
 Append
   Sort Key: parent1.c1, parent1.c2
   ->  Sort
         Sort Key: parent1.c1, parent1.c2
         ->  Append
               ->  Seq Scan on parent1
               ->  Seq Scan on child11
               ->  Seq Scan on child12
   ->  Append
         Sort Key: child21.c1, child21.c2
         ->  Sort
               Sort Key: child21.c1, child21.c2
               ->  Seq Scan on child21
         ->  Sort
               Sort Key: child22.c1, child22.c2
               ->  Seq Scan on child22



===== Patch design =====

First, we don't really know what we're doing :). But this is a proof of 
concept, and if there is interest in this patch, let us know how we should 
have done things and we're willing to rewrite it entirely if needed.

==== Overview ====

In the optimiser, we generate PathKeys representing the two sort orders by 
wich partition can be ordered (ASC and DESC), only for "native" range-based 
partitioned tables, and store them in RelOptInfo associated with every parent 
table, along with a list of OIDs corresponding to the partitions.

Then, when the time comes to generate append paths, we add another step which 
tries to match the query_pathkeys to those of the partition, and generate 
another append node with the sorted paths for each partitions, in the expected 
order, and a pathkey to the append path itself to indicate its already sorted.

==== Known Problems with the patch ====

Once again, keep in mind it's a proof-of-concept  

- It is in conflict with the existing patch designed to avoid adding the 
parent partitions
to the append node. As such, it will almost certainly need a full rewrite to 
adapt to that since that is done in this patch in a probably less than ideal 
fashion.
- As of now, the patch doesn't try to match the partition pathkey to 
mergejoinable pathkeys. 
- maybe a new node should be created altogether, instead of porting most of 
the MergeAppend struct fields to the basic Append ?
- the way we store the PathKeys and partitions OID directly in RelOptInfo is 
probably wrong
- the major impact on existing queries is the fact that to handle sub-
partitioning, RangeTblEntries representing intermediate append nodes are added 
(but just in the case of native partitioning, regular inheritance is not 
affected). See updated prepunion.c for what that means. Those RTE are then 
ignored 
when generating the real Append nodes.
- new regression tests have not been written yet, and existing ones haven't 
been "fixed" to take into account the different partition ordering: in case of 
sub partitioning, the order is now "depth-first" rather than "breadth-first" 
like it was earlier.
- no documentation has been added, although we don't really know where it 
should go
- the patch lacks a lot of comments
- the key displayed in EXPLAIN output comes from the first partition, not the 
parent.
- maybe a "real" SortPath should be generated, instead of generating Sort 
nodes 
out of nowhere when creating the plan. This has been done to conform to what 
MergeAppend already did.

===== Questions =====

Is there interest for such an optimization ?
Should we work from the patch proposed to remove parent tables already ?
Should there be a new Node for such a "SortedAppend" operation or is it fine 
keeping it with the Append node already ? Should that instead be an 
optimization of the MergeAppend Node ?
What is or is not acceptable with regards to storing things in RelOptInfo 
(among others) ?

More generally, any kind of feedback that will help us get on the right track 
will be appreciated.


-- 
Ronan Dunklau & Julien Rouhaud
http://dalibo.com - http://dalibo.org
commit 21e09fb1038b0fb48f32a9013bee64279af8dfd7
Author: Ronan Dunklau <ronan.dunk...@dalibo.com>
Date:   Tue Feb 28 09:00:44 2017 +0100

    Consider sorting order of partitions for append nodes

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c9b55ead3d..e566928f6c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -81,6 +81,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 			   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 					   ExplainState *es);
+static void show_append_keys(AppendState *mstate, List *ancestors,
+					   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
 			  ExplainState *es);
 static void show_grouping_sets(PlanState *planstate, Agg *agg,
@@ -1561,6 +1563,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
 			show_sort_info(castNode(SortState, planstate), es);
 			break;
+		case T_Append:
+			show_append_keys(castNode(AppendState, planstate),
+								   ancestors, es);
+			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
 								   ancestors, es);
@@ -1703,7 +1709,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 							   ancestors, es);
 			break;
 		case T_MergeAppend:
-			ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
+			ExplainMemberNodes(((MergeAppend*) plan)->plan.appendplans,
 							   ((MergeAppendState *) planstate)->mergeplans,
 							   ancestors, es);
 			break;
@@ -1901,7 +1907,23 @@ static void
 show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 					   ExplainState *es)
 {
-	MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+	Append *plan = (Append *) mstate->ps.plan;
+
+	show_sort_group_keys((PlanState *) mstate, "Sort Key",
+						 plan->numCols, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
+/*
+ * Likewise, for an Append node.
+ */
+static void
+show_append_keys(AppendState *mstate, List *ancestors,
+					   ExplainState *es)
+{
+	Append *plan = (Append *) mstate->ps.plan;
 
 	show_sort_group_keys((PlanState *) mstate, "Sort Key",
 						 plan->numCols, plan->sortColIdx,
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 7a20bf07a4..a586088211 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -63,6 +63,7 @@ MergeAppendState *
 ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 {
 	MergeAppendState *mergestate = makeNode(MergeAppendState);
+	Append			*append = &node->plan;
 	PlanState **mergeplanstates;
 	int			nplans;
 	int			i;
@@ -74,7 +75,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	/*
 	 * Set up empty vector of subplan states
 	 */
-	nplans = list_length(node->mergeplans);
+	nplans = list_length(append->appendplans);
 
 	mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
 
@@ -108,7 +109,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	 * results into the array "mergeplans".
 	 */
 	i = 0;
-	foreach(lc, node->mergeplans)
+	foreach(lc, append->appendplans)
 	{
 		Plan	   *initNode = (Plan *) lfirst(lc);
 
@@ -125,17 +126,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	/*
 	 * initialize sort-key information
 	 */
-	mergestate->ms_nkeys = node->numCols;
-	mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * node->numCols);
+	mergestate->ms_nkeys = append->numCols;
+	mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * append->numCols);
 
-	for (i = 0; i < node->numCols; i++)
+	for (i = 0; i < append->numCols; i++)
 	{
 		SortSupport sortKey = mergestate->ms_sortkeys + i;
 
 		sortKey->ssup_cxt = CurrentMemoryContext;
-		sortKey->ssup_collation = node->collations[i];
-		sortKey->ssup_nulls_first = node->nullsFirst[i];
-		sortKey->ssup_attno = node->sortColIdx[i];
+		sortKey->ssup_collation = append->collations[i];
+		sortKey->ssup_nulls_first = append->nullsFirst[i];
+		sortKey->ssup_attno = append->sortColIdx[i];
 
 		/*
 		 * It isn't feasible to perform abbreviated key conversion, since
@@ -146,7 +147,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 		 */
 		sortKey->abbreviate = false;
 
-		PrepareSortSupportFromOrderingOp(node->sortOperators[i], sortKey);
+		PrepareSortSupportFromOrderingOp(append->sortOperators[i], sortKey);
 	}
 
 	/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 25fd051d6e..2535ca0eec 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -219,6 +219,18 @@ _copyModifyTable(const ModifyTable *from)
 	return newnode;
 }
 
+static void
+copyAppendFields(const Append *from, Append *newnode)
+{
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+	COPY_NODE_FIELD(appendplans);
+	COPY_SCALAR_FIELD(numCols);
+	COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
+	COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
+	COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
+	COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
+}
+
 /*
  * _copyAppend
  */
@@ -226,17 +238,7 @@ static Append *
 _copyAppend(const Append *from)
 {
 	Append	   *newnode = makeNode(Append);
-
-	/*
-	 * copy node superclass fields
-	 */
-	CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
-	/*
-	 * copy remainder of node
-	 */
-	COPY_NODE_FIELD(appendplans);
-
+	copyAppendFields(from, newnode);
 	return newnode;
 }
 
@@ -247,22 +249,7 @@ static MergeAppend *
 _copyMergeAppend(const MergeAppend *from)
 {
 	MergeAppend *newnode = makeNode(MergeAppend);
-
-	/*
-	 * copy node superclass fields
-	 */
-	CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
-	/*
-	 * copy remainder of node
-	 */
-	COPY_NODE_FIELD(mergeplans);
-	COPY_SCALAR_FIELD(numCols);
-	COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
-	COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
-	COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
-	COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
-
+	copyAppendFields((const Append *)from, (Append *)newnode);
 	return newnode;
 }
 
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6e52eb7231..25a1822c18 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3743,7 +3743,7 @@ planstate_tree_walker(PlanState *planstate,
 				return true;
 			break;
 		case T_MergeAppend:
-			if (planstate_walk_members(((MergeAppend *) plan)->mergeplans,
+			if (planstate_walk_members(((MergeAppend *) plan)->plan.appendplans,
 								((MergeAppendState *) planstate)->mergeplans,
 									   walker, context))
 				return true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7418fbeded..af3c09e44b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -362,26 +362,11 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
 }
 
 static void
-_outAppend(StringInfo str, const Append *node)
+_outAppendInfo(StringInfo str, const Append *node)
 {
-	WRITE_NODE_TYPE("APPEND");
-
+	int i;
 	_outPlanInfo(str, (const Plan *) node);
-
 	WRITE_NODE_FIELD(appendplans);
-}
-
-static void
-_outMergeAppend(StringInfo str, const MergeAppend *node)
-{
-	int			i;
-
-	WRITE_NODE_TYPE("MERGEAPPEND");
-
-	_outPlanInfo(str, (const Plan *) node);
-
-	WRITE_NODE_FIELD(mergeplans);
-
 	WRITE_INT_FIELD(numCols);
 
 	appendStringInfoString(str, " :sortColIdx");
@@ -399,6 +384,23 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
 	appendStringInfoString(str, " :nullsFirst");
 	for (i = 0; i < node->numCols; i++)
 		appendStringInfo(str, " %s", booltostr(node->nullsFirst[i]));
+
+}
+
+static void
+_outAppend(StringInfo str, const Append *node)
+{
+	WRITE_NODE_TYPE("APPEND");
+	_outAppendInfo(str, node);
+}
+
+
+
+static void
+_outMergeAppend(StringInfo str, const MergeAppend *node)
+{
+	WRITE_NODE_TYPE("MERGEAPPEND");
+	_outAppendInfo(str, (const Append *) &node->plan);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d3bbc02f24..771aec38d3 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1554,17 +1554,31 @@ _readModifyTable(void)
 	READ_DONE();
 }
 
+
+static void
+ReadCommonAppend(Append* local_node)
+{
+	READ_TEMP_LOCALS();
+
+	ReadCommonPlan(&local_node->plan);
+
+	READ_NODE_FIELD(appendplans);
+	READ_INT_FIELD(numCols);
+	READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
+	READ_OID_ARRAY(sortOperators, local_node->numCols);
+	READ_OID_ARRAY(collations, local_node->numCols);
+	READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+}
+
 /*
  * _readAppend
  */
 static Append *
 _readAppend(void)
 {
-	READ_LOCALS(Append);
-
-	ReadCommonPlan(&local_node->plan);
+	READ_LOCALS_NO_FIELDS(Append);
 
-	READ_NODE_FIELD(appendplans);
+	ReadCommonAppend(local_node);
 
 	READ_DONE();
 }
@@ -1575,16 +1589,9 @@ _readAppend(void)
 static MergeAppend *
 _readMergeAppend(void)
 {
-	READ_LOCALS(MergeAppend);
-
-	ReadCommonPlan(&local_node->plan);
+	READ_LOCALS_NO_FIELDS(MergeAppend);
 
-	READ_NODE_FIELD(mergeplans);
-	READ_INT_FIELD(numCols);
-	READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
-	READ_OID_ARRAY(sortOperators, local_node->numCols);
-	READ_OID_ARRAY(collations, local_node->numCols);
-	READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+	ReadCommonAppend(&local_node->plan);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43bfd23804..851a6facdb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -26,6 +26,7 @@
 #include "foreign/fdwapi.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "miscadmin.h"
 #ifdef OPTIMIZER_DEBUG
 #include "nodes/print.h"
 #endif
@@ -93,6 +94,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 					Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 						Index rti, RangeTblEntry *rte);
+static int indexof_path_relation_in_oid_list(Oid oid, Oid* sorted_oids, int array_size);
+static void generate_sorted_append_paths(PlannerInfo *root, RelOptInfo *parent_rel, List *ordered_childrels);
+
 static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 						   List *live_childrels,
 						   List *all_child_pathkeys);
@@ -1185,6 +1189,24 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	int			parentRTindex = rti;
 	List	   *live_childrels = NIL;
 	ListCell   *l;
+	/* If the parent table is partitioned, sort the childrels according to
+	 * the partitioning ASC ordering */
+	int currentidx = 0;
+	bool is_partitioned = rel->rel_sorted_part_oids != NIL;
+	int		    num_parts = list_length(rel->rel_sorted_part_oids);
+	Oid		    *parts_oids = palloc0(sizeof(Oid) * num_parts);
+	RelOptInfo	**ordered_partition_rels = palloc0(sizeof(RelOptInfo*) * num_parts);
+	/* Transform the partition's oid list into an array */
+	if(is_partitioned)
+	{
+		ListCell *lc;
+		int i = 0;
+		foreach(lc, rel->rel_sorted_part_oids)
+		{
+			parts_oids[i] = lfirst_oid(lc);
+			i++;
+		}
+	}
 
 	/*
 	 * Generate access paths for each member relation, and remember the
@@ -1226,10 +1248,39 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		if (IS_DUMMY_REL(childrel))
 			continue;
 
+
+
 		/*
-		 * Child is live, so add it to the live_childrels list for use below.
+		 * Child is live, so add it to the ordered_partition_rel
+		 * For partitioned cases, add the RelOptInfo in the right position of
+		 * the sorted array. Parent partition table is guaranted to be empty,
+		 * so exclude it. In the general case, just add it directly to the
+		 * resulting list
 		 */
-		live_childrels = lappend(live_childrels, childrel);
+		if ( is_partitioned ) {
+			/* Exclude childRTE generated to make sure the parent table would be
+			 * scanned */
+			if(childRTE->relid != rte->relid)
+			{
+				int partindex = indexof_path_relation_in_oid_list(childRTE->relid, parts_oids, num_parts);
+				ordered_partition_rels[partindex] = childrel;
+			}
+		}
+		else
+			live_childrels = lappend(live_childrels, childrel);
+
+
+		currentidx++;
+	}
+
+	/* Finally, build the live_childrels list in the ASC order
+	 * of the partition key */
+	if ( is_partitioned ) {
+		int i;
+		for(i = 0; i < list_length(rel->rel_sorted_part_oids); i++)
+		{
+			live_childrels = lappend(live_childrels, ordered_partition_rels[i]);
+		}
 	}
 
 	/* Add paths to the "append" relation. */
@@ -1260,6 +1311,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 
+
+
 	/*
 	 * For every non-dummy child, remember the cheapest path.  Also, identify
 	 * all pathkeys (orderings) and parameterizations (required_outer sets)
@@ -1288,6 +1341,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		else
 			partial_subpaths_valid = false;
 
+
+
 		/*
 		 * Collect lists of all the available path orderings and
 		 * parameterizations for all the children.  We use these as a
@@ -1359,14 +1414,16 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * if we have zero or one live subpath due to constraint exclusion.)
 	 */
 	if (subpaths_valid)
-		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+		add_path(rel, (Path *) create_append_path(root, rel, subpaths, NULL, 0, NULL));
 
+	/* If we are partitioned, build ordered append path matching the
+	 * PathKeys derived from the partition key */
+	generate_sorted_append_paths(root, rel, live_childrels);
 	/*
 	 * Consider an append of partial unordered, unparameterized partial paths.
 	 */
 	if (partial_subpaths_valid)
 	{
-		AppendPath *appendpath;
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
@@ -1385,9 +1442,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		Assert(parallel_workers > 0);
 
 		/* Generate a partial append path. */
-		appendpath = create_append_path(rel, partial_subpaths, NULL,
-										parallel_workers);
-		add_partial_path(rel, (Path *) appendpath);
+		add_partial_path(rel, (Path *) create_append_path(root, rel, partial_subpaths, NULL,
+										parallel_workers, NULL));
 	}
 
 	/*
@@ -1396,7 +1452,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 */
 	if (subpaths_valid)
 		generate_mergeappend_paths(root, rel, live_childrels,
-								   all_child_pathkeys);
+	    							   all_child_pathkeys);
 
 	/*
 	 * Build Append paths for each parameterization seen among the child rels.
@@ -1438,10 +1494,124 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		if (subpaths_valid)
 			add_path(rel, (Path *)
-					 create_append_path(rel, subpaths, required_outer, 0));
+					 create_append_path(root, rel, subpaths, required_outer, 0, NULL));
 	}
 }
 
+static int
+indexof_path_relation_in_oid_list(Oid oid, Oid* sorted_oids, int array_size)
+{
+	int i;
+
+	for(i=0; i < array_size; i++)
+	{
+		if(sorted_oids[i] == oid)
+			return i;
+	}
+
+	return -1;
+}
+
+
+static void
+generate_sorted_append_paths(PlannerInfo *root, RelOptInfo *parent_rel, List *ordered_childrels)
+{
+	ListCell *lc;
+	List *partitions_asc = ordered_childrels;
+	List *partitions_desc = NIL;
+	RangeTblEntry * parent_rte = planner_rt_fetch(parent_rel->relid, root);
+
+	if(parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+		return;
+
+	foreach(lc, partitions_asc)
+	{
+		partitions_desc = lcons(lfirst(lc), partitions_desc);
+	}
+
+	foreach(lc, parent_rel->rel_partitioned_pathkeys)
+	{
+		List *pathkeys = (List *) lfirst(lc);
+		PathKey *first = (PathKey *) linitial(pathkeys);
+		List *ordered_partitions = first->pk_strategy == BTLessStrategyNumber ?
+			partitions_asc : partitions_desc;
+		List *startup_subpaths = NIL;
+		List *total_subpaths = NIL;
+		List *sequential_subpaths = NIL;
+		bool startup_neq_total = false;
+		ListCell *lc2;
+		if(compare_pathkeys(pathkeys, root->query_pathkeys) == PATHKEYS_DIFFERENT)
+		{
+			continue;
+		}
+		foreach(lc2, ordered_partitions)
+		{
+			RelOptInfo *childrel = lfirst(lc2);
+			Path *cheapest_startup,
+				 *cheapest_total,
+				 *sequential = NULL;
+			/* The partition may have been pruned */
+			if (!childrel)
+				continue;
+
+			cheapest_startup = get_cheapest_path_for_pathkeys(childrel->pathlist,
+					root->query_pathkeys,
+					NULL,
+					STARTUP_COST, false);
+			cheapest_total = get_cheapest_path_for_pathkeys(childrel->pathlist,
+					root->query_pathkeys,
+					NULL,
+					TOTAL_COST, false);
+
+			/*
+			 * If we can't find any paths with the right order just use the
+			 * cheapest-total path; we'll have to sort it later.
+			 */
+			if (cheapest_startup == NULL || cheapest_total == NULL)
+			{
+				cheapest_startup = cheapest_total =
+					childrel->cheapest_total_path;
+				/* Assert we do have an unparameterized path for this child */
+				Assert(cheapest_total->param_info == NULL);
+			}
+			/*
+			 * Force a an unordered path, which could be cheaper in corner cases where
+			 * orderedpaths are too expensive.
+			 */
+			sequential = childrel->cheapest_total_path;
+
+			/*
+			 * Notice whether we actually have different paths for the
+			 * "cheapest" and "total" cases; frequently there will be no point
+			 * in two create_merge_append_path() calls.
+			 */
+			if (cheapest_startup != cheapest_total)
+				startup_neq_total = true;
+			startup_subpaths =
+				lappend(startup_subpaths, cheapest_startup);
+			total_subpaths =
+				lappend(total_subpaths, cheapest_total);
+			sequential_subpaths =
+				lappend(sequential_subpaths, sequential);
+
+		}
+		if(startup_subpaths)
+		{
+			add_path(parent_rel, (Path *) create_append_path(root, parent_rel, startup_subpaths,
+						NULL, 0, root->query_pathkeys));
+		}
+		if (startup_neq_total)
+			add_path(parent_rel, (Path *) create_append_path(root,
+					parent_rel, total_subpaths, NULL, 0, root->query_pathkeys));
+		if (sequential_subpaths){
+			add_path(parent_rel, (Path *) create_append_path(root,
+					parent_rel, sequential_subpaths, NULL, 0, root->query_pathkeys));
+		}
+	}
+}
+
+
+
 /*
  * generate_mergeappend_paths
  *		Generate MergeAppend paths for an append relation
@@ -1670,8 +1840,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
 	/* Discard any pre-existing paths; no further need for them */
 	rel->pathlist = NIL;
 	rel->partial_pathlist = NIL;
-
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NULL));
 
 	/*
 	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 0551668976..5b3c29b035 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->partial_pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+	add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NIL));
 
 	/* Set or update cheapest_total_path and related fields */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 89e1946fc2..66d3ac6273 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -78,8 +78,9 @@ static List *get_gating_quals(PlannerInfo *root, List *quals);
 static Plan *create_gating_plan(PlannerInfo *root, Path *path, Plan *plan,
 				   List *gating_quals);
 static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
-static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
-static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path);
+static Plan *create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path);
+static Plan *wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples);
+
 static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path);
 static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
@@ -199,7 +200,7 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
 			 Index scanrelid, int ctePlanId, int cteParam);
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(NodeTag node_type, List *tlist);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -377,12 +378,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 									(JoinPath *) best_path);
 			break;
 		case T_Append:
-			plan = create_append_plan(root,
-									  (AppendPath *) best_path);
-			break;
 		case T_MergeAppend:
-			plan = create_merge_append_plan(root,
-											(MergeAppendPath *) best_path);
+			plan = create_append_plan(best_path->pathtype, root,
+									  (AppendPath *) best_path);
 			break;
 		case T_Result:
 			if (IsA(best_path, ProjectionPath))
@@ -976,13 +974,15 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
  *	  Returns a Plan node.
  */
 static Plan *
-create_append_plan(PlannerInfo *root, AppendPath *best_path)
+create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path)
 {
-	Append	   *plan;
+	Append	   *node;
+	Plan 	   *plan;
 	List	   *tlist = build_path_tlist(root, &best_path->path);
+	List	   *pathkeys = best_path->path.pathkeys;
 	List	   *subplans = NIL;
 	ListCell   *subpaths;
-
+	double limit_tuples = best_path->limit_tuples;
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
 	 * constraint exclusion.  In that case generate a dummy plan that returns
@@ -994,9 +994,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 */
 	if (best_path->subpaths == NIL)
 	{
-		/* Generate a Result plan with constant-FALSE gating qual */
-		Plan	   *plan;
-
 		plan = (Plan *) make_result(tlist,
 									(Node *) list_make1(makeBoolConst(false,
 																	  false)),
@@ -1006,63 +1003,10 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 		return plan;
 	}
-
-	/* Build the plan for each child */
-	foreach(subpaths, best_path->subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(subpaths);
-		Plan	   *subplan;
-
-		/* Must insist that all children return the same tlist */
-		subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
-		subplans = lappend(subplans, subplan);
-	}
-
-	/*
-	 * XXX ideally, if there's just one child, we'd not bother to generate an
-	 * Append node but just return the single child.  At the moment this does
-	 * not work because the varno of the child scan plan won't match the
-	 * parent-rel Vars it'll be asked to emit.
-	 */
-
-	plan = make_append(subplans, tlist);
-
-	copy_generic_path_info(&plan->plan, (Path *) best_path);
-
-	return (Plan *) plan;
-}
-
-/*
- * create_merge_append_plan
- *	  Create a MergeAppend plan for 'best_path' and (recursively) plans
- *	  for its subpaths.
- *
- *	  Returns a Plan node.
- */
-static Plan *
-create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
-{
-	MergeAppend *node = makeNode(MergeAppend);
-	Plan	   *plan = &node->plan;
-	List	   *tlist = build_path_tlist(root, &best_path->path);
-	List	   *pathkeys = best_path->path.pathkeys;
-	List	   *subplans = NIL;
-	ListCell   *subpaths;
-
-	/*
-	 * We don't have the actual creation of the MergeAppend node split out
-	 * into a separate make_xxx function.  This is because we want to run
-	 * prepare_sort_from_pathkeys on it before we do so on the individual
-	 * child plans, to make cross-checking the sort info easier.
-	 */
+	node = make_append(node_type, tlist);
+	plan = &node->plan;
 	copy_generic_path_info(plan, (Path *) best_path);
 	plan->targetlist = tlist;
-	plan->qual = NIL;
-	plan->lefttree = NULL;
-	plan->righttree = NULL;
-
-	/* Compute sort column info, and adjust MergeAppend's tlist as needed */
 	(void) prepare_sort_from_pathkeys(plan, pathkeys,
 									  best_path->path.parent->relids,
 									  NULL,
@@ -1073,72 +1017,21 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 									  &node->collations,
 									  &node->nullsFirst);
 
-	/*
-	 * Now prepare the child plans.  We must apply prepare_sort_from_pathkeys
-	 * even to subplans that don't need an explicit sort, to make sure they
-	 * are returning the same sort key columns the MergeAppend expects.
-	 */
+	/* Build the plan for each child */
 	foreach(subpaths, best_path->subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(subpaths);
-		Plan	   *subplan;
-		int			numsortkeys;
-		AttrNumber *sortColIdx;
-		Oid		   *sortOperators;
-		Oid		   *collations;
-		bool	   *nullsFirst;
-
-		/* Build the child plan */
-		/* Must insist that all children return the same tlist */
-		subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
-		/* Compute sort column info, and adjust subplan's tlist as needed */
-		subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
-											 subpath->parent->relids,
-											 node->sortColIdx,
-											 false,
-											 &numsortkeys,
-											 &sortColIdx,
-											 &sortOperators,
-											 &collations,
-											 &nullsFirst);
-
-		/*
-		 * Check that we got the same sort key information.  We just Assert
-		 * that the sortops match, since those depend only on the pathkeys;
-		 * but it seems like a good idea to check the sort column numbers
-		 * explicitly, to ensure the tlists really do match up.
-		 */
-		Assert(numsortkeys == node->numCols);
-		if (memcmp(sortColIdx, node->sortColIdx,
-				   numsortkeys * sizeof(AttrNumber)) != 0)
-			elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
-		Assert(memcmp(sortOperators, node->sortOperators,
-					  numsortkeys * sizeof(Oid)) == 0);
-		Assert(memcmp(collations, node->collations,
-					  numsortkeys * sizeof(Oid)) == 0);
-		Assert(memcmp(nullsFirst, node->nullsFirst,
-					  numsortkeys * sizeof(bool)) == 0);
-
-		/* Now, insert a Sort node if subplan isn't sufficiently ordered */
-		if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
-		{
-			Sort	   *sort = make_sort(subplan, numsortkeys,
-										 sortColIdx, sortOperators,
-										 collations, nullsFirst);
-
-			label_sort_with_costsize(root, sort, best_path->limit_tuples);
-			subplan = (Plan *) sort;
-		}
-
+		/* TODO: decrease limit tuples for each subpath */
+		Plan *subplan = plan = wrap_sort(root, node, subpath, pathkeys, limit_tuples);
+		if(limit_tuples > 0)
+			limit_tuples = Max(1, limit_tuples - subpath->rows);
 		subplans = lappend(subplans, subplan);
 	}
-
-	node->mergeplans = subplans;
-
+	node->appendplans = subplans;
 	return (Plan *) node;
 }
 
+
 /*
  * create_result_plan
  *	  Create a Result plan for 'best_path'.
@@ -1537,7 +1430,7 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	 * anyway.  Usually create_projection_path will have detected that and set
 	 * dummypp if we don't need a Result; but its decision can't be final,
 	 * because some createplan.c routines change the tlists of their nodes.
-	 * (An example is that create_merge_append_plan might add resjunk sort
+	 * (An example is that create_append_plan might add resjunk sort
 	 * columns to a MergeAppend.)  So we have to recheck here.  If we do
 	 * arrive at a different answer than create_projection_path did, we'll
 	 * have made slightly wrong cost estimates; but label the plan with the
@@ -5161,20 +5054,85 @@ make_foreignscan(List *qptlist,
 }
 
 static Append *
-make_append(List *appendplans, List *tlist)
+make_append(NodeTag node_type, List *tlist)
 {
-	Append	   *node = makeNode(Append);
-	Plan	   *plan = &node->plan;
+	Append	   *node;
+	Plan	   *plan;
+	if(node_type != T_Append && node_type != T_MergeAppend)
+		elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
 
-	plan->targetlist = tlist;
+	switch(node_type)
+	{
+		case T_Append:
+			node = makeNode(Append);
+			break;
+		case T_MergeAppend:
+			node = (Append *) makeNode(MergeAppend);
+			break;
+		default:
+			elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
+	}
+
+	plan = &node->plan;  	plan->targetlist = tlist;
 	plan->qual = NIL;
 	plan->lefttree = NULL;
 	plan->righttree = NULL;
-	node->appendplans = appendplans;
-
+	node->appendplans = NIL;
+	node->numCols = 0;
+	node->sortColIdx = NULL;
+	node->sortOperators = NULL;
+	node->collations = NULL;
+	node->nullsFirst = NULL;
 	return node;
 }
 
+static Plan *
+wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples)
+{
+	int			numCols;
+	AttrNumber *sortColIdx;
+	Oid		   *sortOperators;
+	Oid		   *collations;
+	bool	   *nullsFirst;
+	Plan	   *subplan;
+	subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
+	if(pathkeys != NIL)
+	{
+		subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
+											 subpath->parent->relids,
+											 parentplan->sortColIdx,
+											 false,
+											 &numCols,
+											 &sortColIdx,
+											 &sortOperators,
+											 &collations,
+											 &nullsFirst);
+		Assert(numCols == parentplan->numCols);
+		if (memcmp(sortColIdx, parentplan->sortColIdx,
+					numCols * sizeof(AttrNumber)) != 0)
+			elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
+		Assert(memcmp(sortOperators, parentplan->sortOperators,
+					numCols * sizeof(Oid)) == 0);
+		Assert(memcmp(collations, parentplan->collations,
+					numCols * sizeof(Oid)) == 0);
+		Assert(memcmp(nullsFirst, parentplan->nullsFirst,
+					numCols * sizeof(bool)) == 0);
+		/* Now, insert a Sort node if subplan isn't sufficiently ordered */
+		if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
+		{
+			Sort	   *sort = make_sort(subplan, numCols,
+										 sortColIdx, sortOperators,
+										 collations, nullsFirst);
+
+			label_sort_with_costsize(root, sort, limit_tuples);
+			subplan = (Plan *) sort;
+		}
+
+	}
+	return subplan;
+
+}
+
 static RecursiveUnion *
 make_recursive_union(List *tlist,
 					 Plan *lefttree,
@@ -5587,6 +5545,7 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
 					break;		/* found usable expression */
 				}
 			}
+
 			if (!j)
 				elog(ERROR, "could not find pathkey item to sort");
 
@@ -6088,7 +6047,6 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
 				tle = NULL;
 			}
 		}
-
 		if (!tle)
 			elog(ERROR, "could not find pathkey item to sort");
 
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3c58d0596c..43d3a57e60 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -25,6 +25,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
 
 
@@ -149,6 +150,7 @@ query_planner(PlannerInfo *root, List *tlist,
 	 */
 	build_base_rel_tlists(root, tlist);
 
+
 	find_placeholders_in_jointree(root);
 
 	find_lateral_references(root);
@@ -176,6 +178,14 @@ query_planner(PlannerInfo *root, List *tlist,
 	 */
 	(*qp_callback) (root, qp_extra);
 
+
+	/*
+	 * We consider generating pathkeys for partitionned tables only if the
+	 * query has some ordering
+	 */
+	if (root->query_pathkeys != NIL)
+		generate_pathkeys_for_partitioned_tables(root);
+
 	/*
 	 * Examine any "placeholder" expressions generated during subquery pullup.
 	 * Make sure that the Vars they need are marked as needed at the relevant
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 02286d9c52..2e5f23182c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3345,10 +3345,12 @@ create_grouping_paths(PlannerInfo *root,
 				paths = lappend(paths, path);
 			}
 			path = (Path *)
-				create_append_path(grouped_rel,
+				create_append_path(root,
+								   grouped_rel,
 								   paths,
 								   NULL,
-								   0);
+								   0,
+								   NULL);
 			path->pathtarget = target;
 		}
 		else
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5f3027e96f..ec579c7141 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -892,8 +892,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				 * targetlist or check quals.
 				 */
 				set_dummy_tlist_references(plan, rtoffset);
-				Assert(splan->plan.qual == NIL);
-				foreach(l, splan->mergeplans)
+				Assert(splan->plan.plan.qual == NIL);
+				foreach(l, splan->plan.appendplans)
 				{
 					lfirst(l) = set_plan_refs(root,
 											  (Plan *) lfirst(l),
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6fa6540662..c61238e22d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2565,7 +2565,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			{
 				ListCell   *l;
 
-				foreach(l, ((MergeAppend *) plan)->mergeplans)
+				foreach(l, ((MergeAppend *) plan)->plan.appendplans)
 				{
 					context.paramids =
 						bms_add_members(context.paramids,
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1389db18ba..077c179349 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,7 +566,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -678,7 +678,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+	path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL);
 
 	/* We have to manually jam the right tlist into the path; ick */
 	path->pathtarget = create_pathtarget(root, tlist);
@@ -1405,7 +1405,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 		lockmode = AccessShareLock;
 
 	/* Scan for all members of inheritance set, acquire needed locks */
-	inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+	if (rte->relkind == RELKIND_PARTITIONED_TABLE){
+		inhOIDs = find_inheritance_children(parentOID, lockmode);
+		/* find_all_inheritors adds self ourself, but find_inheritance_children
+		 * does not. We need to add it. */
+		inhOIDs = lcons_oid(parentOID, inhOIDs);
+	}  else {
+		inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+	}
 
 	/*
 	 * Check that there's at least one descendant, else treat as no-child
@@ -1476,7 +1483,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 		childrte = copyObject(rte);
 		childrte->relid = childOID;
 		childrte->relkind = newrelation->rd_rel->relkind;
-		childrte->inh = false;
+		childrte->inh = rte->relkind == RELKIND_PARTITIONED_TABLE && childOID != parentOID;
 		childrte->requiredPerms = 0;
 		childrte->securityQuals = NIL;
 		parse->rtable = lappend(parse->rtable, childrte);
@@ -1495,6 +1502,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 		appinfo->parent_reloid = parentOID;
 		appinfos = lappend(appinfos, appinfo);
 
+
 		/*
 		 * Translate the column permissions bitmaps to the child's attnums (we
 		 * have to build the translated_vars list before we can do this). But
@@ -1537,6 +1545,13 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 			root->rowMarks = lappend(root->rowMarks, newrc);
 		}
 
+		/*
+		 * Recursively expand the child for partitioned tables
+		 * For regular inheritance, all children are flattened
+		 */
+		expand_inherited_rtentry(root, childrte, childRTindex);
+
+
 		/* Close child relations, but keep locks */
 		if (childOID != parentOID)
 			heap_close(newrelation, NoLock);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ce772d274..4271dfde49 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1200,12 +1200,17 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
  * Note that we must handle subpaths = NIL, representing a dummy access path.
  */
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
-				   int parallel_workers)
+create_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, Relids required_outer,
+				   int parallel_workers, List *pathkeys)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
-
+	double		limit_tuples;
+	Cost		input_startup_cost;
+	Cost		input_total_cost;
+	/* Just to make sure that nobody is trying to build a parallel sorted append
+	 * path */
+	Assert(parallel_workers > 0 ? pathkeys == NIL : true);
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
@@ -1214,8 +1219,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = parallel_workers;
-	pathnode->path.pathkeys = NIL;		/* result is always considered
-										 * unsorted */
+	pathnode->path.pathkeys = pathkeys;
 	pathnode->subpaths = subpaths;
 
 	/*
@@ -1229,22 +1233,48 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->path.rows = 0;
 	pathnode->path.startup_cost = 0;
 	pathnode->path.total_cost = 0;
+	if (root && bms_equal(rel->relids, root->all_baserels))
+		pathnode->limit_tuples = root->limit_tuples;
+	else
+		pathnode->limit_tuples = -1.0;
+	limit_tuples = pathnode->limit_tuples;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
+		input_startup_cost = subpath->startup_cost;
+		input_total_cost = subpath->total_cost;
 
-		pathnode->path.rows += subpath->rows;
+		if(pathkeys && !pathkeys_contained_in(pathkeys, subpath->pathkeys))
+		{
+			Path		sort_path;		/* dummy for result of cost_sort */
 
+			cost_sort(&sort_path,
+					root,
+					pathkeys,
+					subpath->total_cost,
+					subpath->rows,
+					subpath->pathtarget->width,
+					0.0,
+					work_mem,
+					limit_tuples);
+			input_startup_cost += sort_path.startup_cost;
+			input_total_cost = sort_path.total_cost;
+		}
+
+		pathnode->path.rows += subpath->rows;
 		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
+			pathnode->path.startup_cost = input_startup_cost;
+		/* Consider that the number of tuples to be fetched decreases
+		 * for every subsequent partition */
+		if(limit_tuples > 0)
+			limit_tuples = Max(1, limit_tuples - subpath->rows);
+
+		pathnode->path.total_cost += input_total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
-
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
-
 	return pathnode;
 }
 
@@ -3208,7 +3238,6 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel,
 				  int64 offset_est, int64 count_est)
 {
 	LimitPath  *pathnode = makeNode(LimitPath);
-
 	pathnode->path.pathtype = T_Limit;
 	pathnode->path.parent = rel;
 	/* Limit doesn't project, so use source path's pathtarget */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 463f806467..c51685e2be 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -34,6 +34,7 @@
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/paths.h"
 #include "optimizer/plancat.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
@@ -409,7 +410,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 		rel->serverid = InvalidOid;
 		rel->fdwroutine = NULL;
 	}
-
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
@@ -1251,6 +1251,139 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * Generate pathkeys for Range-based partitions
+ */
+void
+generate_pathkeys_for_partitioned_tables(PlannerInfo *root)
+{
+	int i;
+	for(i = 1; i < root->simple_rel_array_size; i++)
+	{
+		RelOptInfo *rel = root->simple_rel_array[i];
+		/* Only base relation can be partitionned */
+		if(rel && has_useful_pathkeys(root, rel))
+		{
+			Index varno = rel->relid;
+			Index parent_varno = varno;
+			RelOptInfo *parent_rel = rel;
+			Relation relation;
+			RangeTblEntry *rte = planner_rt_fetch(varno, root);
+			if(rte->relkind != RELKIND_PARTITIONED_TABLE)
+				continue;
+			while(parent_rel->reloptkind != RELOPT_BASEREL)
+			{
+				ListCell *lc;
+				foreach(lc, root->append_rel_list)
+				{
+					AppendRelInfo *ari = lfirst(lc);
+					if(ari->child_relid == parent_rel->relid){
+						parent_rel = root->simple_rel_array[ari->parent_relid];
+						break;
+					}
+					/* Should never happen: we should always be able to climb up the
+					 *  inheritance tree */
+					if(!lc)
+					{
+						elog(ERROR, "Unable to find parent table for child ");
+					}
+
+				}
+
+			}
+			parent_varno = parent_rel->relid;
+
+
+			/*
+			 * Ignore base rel if it's not a partitionned table. We'll still
+			 * have to open it to verify if it's a range partitionned table
+			 */
+			if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+				continue;
+
+			relation = heap_open(rte->relid, NoLock);
+
+			/*
+			 * Store the child partitions OIDs and build pathkeys for the
+			 * partitioning keys
+			 */
+			if(relation->rd_partkey->strategy == PARTITION_STRATEGY_RANGE)
+			{
+				int j;
+				ListCell *lc;
+				Oid equality_op;
+				EquivalenceClass *ec;
+				List *opfamilies;
+				List *asc_pathkeys = NIL;
+				List *desc_pathkeys = NIL;
+				Assert(rel->rel_sorted_part_oids == NIL);
+				Assert(rel->rel_partitioned_pathkeys == NIL);
+
+				for(j = 0; j < relation->rd_partdesc->nparts; j++)
+				{
+					rel->rel_sorted_part_oids =
+						lappend_oid(rel->rel_sorted_part_oids,
+								relation->rd_partdesc->oids[j]);
+				}
+
+				/* Lookup individual vars from the pathtarget */
+				/* FIXME: refactor this in an external function */
+				lc = list_head(relation->rd_partkey->partexprs);
+				for(j=0; j < relation->rd_partkey->partnatts; j++)
+				{
+					AttrNumber attno = relation->rd_partkey->partattrs[j];
+					Expr *expr = NULL;
+					/* This is not an attribute, but an expression */
+					if(attno == InvalidAttrNumber)
+					{
+						/* Should never append : we should be able to fetch
+						 * an expression for anything in the partition key */
+						if (!lc)
+							elog(ERROR, "Could not find expression for partition key");
+						expr = lfirst(lc);
+						lc = lnext(lc);
+					}
+					else
+					{
+						expr = (Expr*) makeVar(parent_varno, attno, relation->rd_partkey->parttypid[j],
+									  relation->rd_partkey->parttypmod[j],
+									  relation->rd_partkey->parttypcoll[j],
+									  0);
+					}
+					equality_op = get_opfamily_member(relation->rd_partkey->partopfamily[j],
+																		  relation->rd_partkey->partopcintype[j],
+																		  relation->rd_partkey->partopcintype[j],
+																		  BTEqualStrategyNumber);
+					opfamilies = get_mergejoin_opfamilies(equality_op);
+					ec = get_eclass_for_sort_expr(root, expr,
+							NULL, opfamilies,
+							relation->rd_partkey->partopcintype[j],
+							relation->rd_partkey->partcollation[j],
+							0, rel->relids, true);
+					asc_pathkeys = lappend(asc_pathkeys,  make_canonical_pathkey(root,
+							ec,
+							relation->rd_partkey->partopfamily[j],
+							BTLessStrategyNumber, false));
+					desc_pathkeys  = lappend(desc_pathkeys, make_canonical_pathkey(root,
+							ec,
+							relation->rd_partkey->partopfamily[j],
+							BTGreaterStrategyNumber, true));
+				}
+				/* FIXME: this is as dirty as it gets */
+				if(list_length(asc_pathkeys) > list_length(root->query_pathkeys))
+				{
+					asc_pathkeys = truncate_useless_pathkeys(root, rel, asc_pathkeys);
+					desc_pathkeys = truncate_useless_pathkeys(root, rel, desc_pathkeys);
+				}
+				if(asc_pathkeys)
+					rel->rel_partitioned_pathkeys = lappend(rel->rel_partitioned_pathkeys, asc_pathkeys);
+				if(desc_pathkeys)
+					rel->rel_partitioned_pathkeys = lappend(rel->rel_partitioned_pathkeys, desc_pathkeys);
+			}
+			heap_close(relation, NoLock);
+		}
+	}
+}
 
 /*
  * relation_excluded_by_constraints
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6ab78545c3..b945a86fe7 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -143,6 +143,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->baserestrict_min_security = UINT_MAX;
 	rel->joininfo = NIL;
 	rel->has_eclass_joins = false;
+	rel->rel_partitioned_pathkeys = NIL;
+	rel->rel_sorted_part_oids = NIL;
 
 	/* Check type of rtable entry */
 	switch (rte->rtekind)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b880dc16cf..9532c09d37 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,18 @@ typedef struct Append
 {
 	Plan		plan;
 	List	   *appendplans;
+	/* remaining fields are just like the sort-key info in struct Sort */
+	/* FIXME: We should either
+	 * 	- define Append as sort + appendplans
+	 * 	- define Append "like" sort and define MergeAppend as Append, only with
+	 * 	a different tag
+	 */
+	int			numCols;		/* number of sort-key columns */
+	AttrNumber *sortColIdx;		/* their indexes in the target list */
+	Oid		   *sortOperators;	/* OIDs of operators to sort them by */
+	Oid		   *collations;		/* OIDs of collations */
+	bool	   *nullsFirst;		/* NULLS FIRST/LAST directions */
+
 } Append;
 
 /* ----------------
@@ -237,14 +249,7 @@ typedef struct Append
  */
 typedef struct MergeAppend
 {
-	Plan		plan;
-	List	   *mergeplans;
-	/* remaining fields are just like the sort-key info in struct Sort */
-	int			numCols;		/* number of sort-key columns */
-	AttrNumber *sortColIdx;		/* their indexes in the target list */
-	Oid		   *sortOperators;	/* OIDs of operators to sort them by */
-	Oid		   *collations;		/* OIDs of collations */
-	bool	   *nullsFirst;		/* NULLS FIRST/LAST directions */
+	Append 		plan;
 } MergeAppend;
 
 /* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 05d6f07aea..d893bacf6e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -531,6 +531,9 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	List		*rel_sorted_part_oids; /* if range partitioning */
+	List		*rel_partitioned_pathkeys;
+
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -1117,6 +1120,7 @@ typedef struct AppendPath
 {
 	Path		path;
 	List	   *subpaths;		/* list of component Paths */
+	double		limit_tuples;	/* hard limit on output tuples, or -1 */
 } AppendPath;
 
 #define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 373c7221a8..ed1f9875ab 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,8 +63,12 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
-				   Relids required_outer, int parallel_workers);
+extern AppendPath *create_append_path(PlannerInfo *root,
+					RelOptInfo *rel,
+					List *subpaths,
+				    Relids required_outer,
+					int parallel_workers,
+					List *pathkeys);
 extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 List *subpaths,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 25fe78cddd..92da03f75d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -228,4 +228,5 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first);
 
+
 #endif   /* PATHS_H */
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 68fd713b09..1cc3a861f4 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -34,7 +34,7 @@ extern void estimate_rel_size(Relation rel, int32 *attr_widths,
 				  BlockNumber *pages, double *tuples, double *allvisfrac);
 
 extern int32 get_relation_data_width(Oid relid, int32 *attr_widths);
-
+extern void generate_pathkeys_for_partitioned_tables(PlannerInfo *root);
 extern bool relation_excluded_by_constraints(PlannerInfo *root,
 								 RelOptInfo *rel, RangeTblEntry *rte);
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to