Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

Ashutosh Bapat Fri, 11 Nov 2016 04:51:03 -0800

>
> So, I am thinking about your approach of creating PartitionJoinPaths
> without actually creating child paths and then at a later stage
> actually plan the child joins. Here's rough sketch of how that may be
> done.
>
> At the time of creating regular paths, we identify the join orders
> which can use partition-wise join and save those in the RelOptInfo of
> the parent table. If no such join order exists, we do not create
> PartitionJoinPaths for that relation. Otherwise, once we have
> considered all the join orders i.e. in
> generate_partition_wise_join_paths(), we create one PartitionJoinPath
> for every path that has survived in the parent or at least for every
> path that has distinct properties like pathkeys or parameterisation,
> with those properties.
>
> At the time of creating plans, if PartitionJoinPath is chosen, we
> actually create paths for every partition of that relation
> recursively. The path creation logic is carried out in a different
> memory context. Amongst the paths that survive, we choose the best
> path that has the same properties as PartitionJoinPath. We would
> expect all parameterized paths to be retained and any unparameterized
> path can be sorted to match the pathkeys of reference
> PartitionJoinPath. We then create the plan out of this path and copy
> it into the outer memory context and release the memory context used
> for path creation. This is similar to how prepared statements save
> their plans. Once we have the plan, the memory consumed by paths won't
> be referenced, and hence can not create problems. At the end we create
> an Append/MergeAppend plan with all the child plans and return it.
>
> Costing PartitionJoinPath needs more thought so that we don't end up
> with bad overall plans. Here's an idea. Partition-wise joins are
> better compared to the unpartitioned ones, because of the smaller
> sizes of partitions. If we think of join as O(MN) operation where M
> and N are sizes of unpartitioned tables being joined, partition-wise
> join computes P joins each with average O(M/P * N/P) order where P is
> the number of partitions, which is still O(MN) with constant factor
> reduced by P times. I think, we need to apply similar logic to
> costing. Let's say cost of a join is J(M, N) = S (M, N) + R (M, N)
> where S and R are setup cost and joining cost (for M, N rows) resp.
> Cost of partition-wise join would be P * J(M/P, N/P) = P * S(M/P, N/P)
> + P * R(M/P, N/P). Each of the join methods will have different S and
> R functions and may not be linear on the number of rows. So,
> PartitionJoinPath costs are obtained from corresponding regular path
> costs subjected to above transformation. This way, we will be
> protected from choosing a PartitionJoinPath when it's not optimal.
> Take example of a join where the joining relations are very small in
> size, thus hash join on full relation is optimal compared to hash join
> of each partition because of setup cost. In such a case, the function
> which calculates the cost of hash table setup, would result in almost
> same cost for full table as well as each of the partitions, thus
> increasing P * S(M/P, N/P) as compared to S(M, N).
>
> Let me know your comments.


I tried to measure the impact of having a memory context reset 1000
times (once for each partition) with the attached patch. Without this
patch make check in regress/ takes about 24 seconds on my laptop and
with this patch it takes 26 seconds. This is almost 10% increase in
time. I hope that's fine.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8c5dd3..abc34aa 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -54,6 +54,7 @@
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"
 
 
@@ -192,6 +193,9 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	Plan	   *top_plan;
 	ListCell   *lp,
 			   *lr;
+	MemoryContext	temp_context;
+	int			i;
+	MemoryContext	default_context;
 
 	/* Cursor options may come from caller or from DECLARE CURSOR stmt */
 	if (parse->utilityStmt &&
@@ -432,6 +436,24 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	result->invalItems = glob->invalItems;
 	result->nParamExec = glob->nParamExec;
 
+	temp_context = AllocSetContextCreate(CurrentMemoryContext,
+										   "TemporaryContext",
+										   ALLOCSET_DEFAULT_SIZES);
+
+	/* Test the time impact of creating and destroying 1000 memory contexts. */
+	for (i = 0; i < 1000; i++)
+	{
+		RelOptInfo *rel;
+		default_context = MemoryContextSwitchTo(temp_context);
+		rel = makeNode(RelOptInfo);
+		pfree(rel);
+		MemoryContextSwitchTo(default_context);
+		MemoryContextResetAndDeleteChildren(temp_context);
+	}
+
+	MemoryContextSwitchTo(default_context);
+	MemoryContextDelete(temp_context);
+
 	return result;
 }

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

Reply via email to