Re: [HACKERS] MIN/MAX optimization for partitioned table

Alan Li Mon, 20 Jul 2009 11:41:43 -0700

On Sat, Jul 18, 2009 at 12:04 PM, Greg Stark <gsst...@mit.edu> wrote:


> On Sat, Jul 18, 2009 at 6:35 PM, Alan Li<a...@truviso.com> wrote:
> > Yeah, are you running into the same issue as well?  I tried to figure out
> a
> > way around the O(n^2) behavior, but there were no existing direct
> > relationship between the child subpath and its associated AppendRelInfo.
> I
> > think an AppenRelInfo dynahash would work and just have it hash by the
> > child_relid.
>
> I don't see any place in my patch where I had to do anything like
> this. I do have to loop through all the appendrel elements and skip
> over the ones which don't belong to this appendrel which seems weird
> to me but it isn't n^2.


> Generating a normal Append node you're generating a brand new path for
> each child and attaching them to the append path. It looks like you're
> allowing the append rel to generate paths and then digging around
> looking at those paths. I wonder if that's the right approach.
>
>
> >> The other thing is it would be nice if we could avoid making separate
> >> subplans for each child and instead make one for the whole structure
> >> including the aggregate. It would at the very least make the explain
> >> output prettier, but I think it would avoid repeated execution of the
> >> aggregate in some cases as well.
> >
> > How would this plan look?  I think the repeated execution of the
> aggregate
> > comes from having to process the output of each child.  So it's O(n)
> > executions where n is the number of children, assuming each child has the
> > appropriate index for this optimization.
>
> No I just mean instead of generating
>
> InitPlan 1
>   Limit
>      Index Scan
> InitPlan 2
>   Limit
>      Index Scan
> Aggregate
>   Append
>      Result
>      Result
>
> I think it would be better to generate this:
>
> InitPlan 1
>    Aggregate
>       Append
>          Limit 1
>             IndexScan
>          Limit 1
>             IndexScan
> Result
>
> The reason I think this is better is because if the subquery is
> executed multiple times it will just return the one precalculated
> result. In your plan it will have all the child minimums precalculated
> but will still have to re-execute the aggregate and append node.
> That's not terribly expensive but we might as well generate the
> simpler more efficient plan.
>
>
> > The other optimization that I thought was that the optimizer might use
> the
> > check constraints (in the range partitioning) on the column that I want
> to
> > do a MIN/MAX on.  Then it'll only have to execute the subplan in one
> child
> > partition and the parent partition.  But it was more of a wild idea on my
> > part, not sure if that's possible.
>
> Yes, I think this is the long-term goal. That's the whole "better
> representation of partitions" plotline. To do this efficiently the
> planner should know what the partition key is and what the
> partitioning structure is.
>
> In an ideal world we would be able to collapse the whole structure and
> eliminate the append relation entirely. To do that we need some way to
> indicate that the parent relation is provably empty. I had in mind to
> mark it as read-only and the statistics as authoritative since that
> seems more useful than just being able to mark it empty. I think there
> are a lot of other interesting things we could do with statistics if
> we knew they were authoritative for a read-only table. (The read-only
> property we would need here is very strong. There would have to be a
> vacuum freeze and moreover we would have to know that all tuples were
> successfully frozen.)
>
> --
> greg
> http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>
>
>
>
Attached is an updated patch that removes the O(n^2) behavior and the
awkwardness of optimizing the seqscan path as the plan was about to be
created.  Now, the optimization is considered when appendrel is generating
the paths.

I also changed the plan as you had suggested.  It now looks like this:

postgres=# explain select max(b) from foo_archive;
                                                                      QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1.22..1.23 rows=1 width=8)
   ->  Append  (cost=0.00..1.13 rows=32 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_idx on foo_archive
(cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_01_idx on
foo_archive_2007_01_01 foo_archive  (cost=0.00..2723.24 rows=86399 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_02_idx on
foo_archive_2007_01_02 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_03_idx on
foo_archive_2007_01_03 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_04_idx on
foo_archive_2007_01_04 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_05_idx on
foo_archive_2007_01_05 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_06_idx on
foo_archive_2007_01_06 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_07_idx on
foo_archive_2007_01_07 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_08_idx on
foo_archive_2007_01_08 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_09_idx on
foo_archive_2007_01_09 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_10_idx on
foo_archive_2007_01_10 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_11_idx on
foo_archive_2007_01_11 foo_archive  (cost=0.00..2719.26 rows=86400 width=8)
         ->  Limit  (cost=0.00..0.03 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_12_idx on
foo_archive_2007_01_12 foo_archive  (cost=0.00..1568.27 rows=49601 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_13_idx on
foo_archive_2007_01_13 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_14_idx on
foo_archive_2007_01_14 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_15_idx on
foo_archive_2007_01_15 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_16_idx on
foo_archive_2007_01_16 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_17_idx on
foo_archive_2007_01_17 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_18_idx on
foo_archive_2007_01_18 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_19_idx on
foo_archive_2007_01_19 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_20_idx on
foo_archive_2007_01_20 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_21_idx on
foo_archive_2007_01_21 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_22_idx on
foo_archive_2007_01_22 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_23_idx on
foo_archive_2007_01_23 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_24_idx on
foo_archive_2007_01_24 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_25_idx on
foo_archive_2007_01_25 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_26_idx on
foo_archive_2007_01_26 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_27_idx on
foo_archive_2007_01_27 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_28_idx on
foo_archive_2007_01_28 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_29_idx on
foo_archive_2007_01_29 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_30_idx on
foo_archive_2007_01_30 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
         ->  Limit  (cost=0.00..0.04 rows=1 width=8)
               ->  Index Scan Backward using foo_archive_2007_01_31_idx on
foo_archive_2007_01_31 foo_archive  (cost=0.00..73.35 rows=1940 width=8)
(66 rows)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b375902..0c839b9 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/plancat.h"
+ #include "optimizer/planmain.h"
  #include "optimizer/planner.h"
  #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 289,294 ****
--- 290,296 ----
  	double	   *parent_attrsizes;
  	int			nattrs;
  	ListCell   *l;
+ 	bool		can_optimize_minmax;
  
  	/*
  	 * Initialize to compute size estimates for whole append relation.
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 309,314 ****
--- 311,319 ----
  	nattrs = rel->max_attr - rel->min_attr + 1;
  	parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
  
+ 
+ 	can_optimize_minmax = can_optimize_append_minmax(root);
+ 
  	/*
  	 * Generate access paths for each member relation, and pick the cheapest
  	 * path for each one.
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 427,433 ****
--- 432,454 ----
  			subpaths = list_concat(subpaths,
  								   ((AppendPath *) childpath)->subpaths);
  		else
+ 		{
+ 			if (can_optimize_minmax)
+ 			{
+ 				Path *minmax_path;
+ 				List *targetlist = (List *)
+ 					adjust_appendrel_attrs((Node *) root->parse->targetList,
+ 													appinfo);
+ 				
+ 				minmax_path = optimize_append_minmax(root, targetlist, childrel,
+ 													 childpath);
+ 
+ 				if (minmax_path)
+ 					childpath = minmax_path;
+ 			}
+ 
  			subpaths = lappend(subpaths, childpath);
+ 		}
  
  		/*
  		 * Accumulate size information from each child.
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6e5c251..6643486 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 20,25 ****
--- 20,26 ----
  #include <math.h>
  
  #include "access/skey.h"
+ #include "catalog/pg_type.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #include "optimizer/clauses.h"
*************** create_append_plan(PlannerInfo *root, Ap
*** 546,551 ****
--- 547,554 ----
  	List	   *tlist = build_relation_tlist(best_path->path.parent);
  	List	   *subplans = NIL;
  	ListCell   *subpaths;
+ 	bool	    can_optimize_minmax;
+ 	Node	   *limitCount;
  
  	/*
  	 * It is possible for the subplans list to contain only one entry, or even
*************** create_append_plan(PlannerInfo *root, Ap
*** 566,577 ****
  									NULL);
  	}
  
  	/* Normal case with multiple subpaths */
  	foreach(subpaths, best_path->subpaths)
  	{
  		Path	   *subpath = (Path *) lfirst(subpaths);
  
! 		subplans = lappend(subplans, create_plan(root, subpath));
  	}
  
  	plan = make_append(subplans, false, tlist);
--- 569,591 ----
  									NULL);
  	}
  
+ 	can_optimize_minmax = can_optimize_append_minmax(root);
+ 	if (can_optimize_minmax)
+ 		limitCount = (Node *) makeConst(INT8OID, -1, sizeof(int64),
+ 										Int64GetDatum(1), false,
+ 										FLOAT8PASSBYVAL);
+ 
  	/* Normal case with multiple subpaths */
  	foreach(subpaths, best_path->subpaths)
  	{
  		Path	   *subpath = (Path *) lfirst(subpaths);
+ 		Plan	   *subplan = create_plan(root, subpath);
  
! 		/* apply the LIMIT 1 on top of the index scan */
! 		if (can_optimize_minmax && IsA(subplan, IndexScan))
! 			subplan = (Plan *) make_limit(subplan, NULL, limitCount, 0, 1);
! 
! 		subplans = lappend(subplans, subplan);
  	}
  
  	plan = make_append(subplans, false, tlist);
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 34e9617..f7cdc5e 100644
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** optimize_minmax_aggregates(PlannerInfo *
*** 214,219 ****
--- 214,320 ----
  }
  
  /*
+  * can_optimize_append_minmax - check whether optimizing MIN/MAX via indexes
+  * can potentially be pushed down to child relations.
+  *
+  * This checks whether the query is of the form:
+  * 	1. SELECT MAX(col) FROM tab WHERE ...;
+  * 	2. SELECT MIN(col) FROM tab WHERE ...;
+  */
+ bool
+ can_optimize_append_minmax(PlannerInfo *root)
+ {
+ 	Query	   *parse = root->parse;
+ 	List	   *tlist = parse->targetList;
+ 	List	   *aggs_list;
+ 
+ 	/* Nothing to do if query has no aggregates */
+ 	if (!parse->hasAggs)
+ 		return false;
+ 
+ 	Assert(!parse->setOperations);		/* shouldn't get here if a setop */
+ 	Assert(parse->rowMarks == NIL);		/* nor if FOR UPDATE */
+ 
+ 	/*
+ 	 * Reject unoptimizable cases.
+ 	 *
+ 	 * We don't handle GROUP BY or windowing, because our current
+ 	 * implementations of grouping require looking at all the rows anyway, and
+ 	 * so there's not much point in optimizing MIN/MAX.
+ 	 */
+ 	if (parse->groupClause || parse->hasWindowFuncs)
+ 		return false;
+ 
+ 	/* We're only supporting a single MIN/MAX in the targetlist. */
+ 	if (list_length(tlist) > 1 || parse->havingQual)
+ 		return false;
+ 
+ 	aggs_list = NIL;
+ 	if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ 		return false;
+ 
+ 	if (list_length(aggs_list) > 1)
+ 		return false;
+ 
+ 	return true;
+ }
+ 
+ /*
+  * optimize_append_minmax - returns the IndexPath for a child relation as an
+  * optimization for MIN/MAX.  See can_optimize_append_minmax() above.  Returns
+  * NULL if no IndexPath is available or it is not profitable compared to the
+  * existing best_path.
+  */
+ Path *
+ optimize_append_minmax(PlannerInfo *root, List *tlist, RelOptInfo *rel,
+ 					   Path *best_path)
+ {
+ 	List	   *aggs_list;
+ 	Cost		total_cost;
+ 	MinMaxAggInfo *info;
+ 	Path		agg_p;
+ 	Path		minmax_agg_p;
+ 
+ 	/* Pass 1: Get the one MIN/MAX from the targetlist */
+ 	aggs_list = NIL;
+ 	if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ 		return NULL;
+ 
+ 	Assert(list_length(aggs_list) == 1);
+ 
+ 	/* Pass 2: see if MIN/MAX is optimizable */
+ 	info = (MinMaxAggInfo *) linitial(aggs_list);
+ 	if (!build_minmax_path(root, rel, info))
+ 		return NULL;
+ 	total_cost = info->pathcost;
+ 
+ 	/*
+ 	 * Make the cost comparison.
+ 	 *
+ 	 * We are comparing the cost of the best_path that has to pass all of its
+ 	 * output as input to the aggregate node against the one output tuple in
+ 	 * the optimized path.
+ 	 *
+ 	 * Note that we don't include evaluation cost of the tlist here; this is
+ 	 * OK since it isn't included in best_path's cost either, and should be
+ 	 * the same in either case.
+ 	 */
+ 	cost_agg(&agg_p, root, AGG_PLAIN, list_length(aggs_list),
+ 			 0, 0,
+ 			 best_path->startup_cost, best_path->total_cost,
+ 			 best_path->parent->rows);
+ 
+ 	/* only one input tuple expected from the minmax index path */
+ 	cost_agg(&minmax_agg_p, root, AGG_PLAIN, list_length(aggs_list),
+ 			 0, 0, 0, total_cost, 1);
+ 
+ 	if (minmax_agg_p.total_cost > agg_p.total_cost)
+ 		return NULL;			/* too expensive */
+ 
+ 	return (Path *) info->path;
+ }
+ 
+ /*
   * find_minmax_aggs_walker
   *		Recursively scan the Aggref nodes in an expression tree, and check
   *		that each one is a MIN/MAX aggregate.  If so, build a list of the
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0dd2bcc..61dd0a5 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern void query_planner(PlannerInfo *r
*** 34,39 ****
--- 34,42 ----
   */
  extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  						   Path *best_path);
+ extern bool can_optimize_append_minmax(PlannerInfo *root);
+ extern Path *optimize_append_minmax(PlannerInfo *root, List *tlist,
+ 									RelOptInfo *rel, Path *best_path);
  
  /*
   * prototypes for plan/createplan.c

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MIN/MAX optimization for partitioned table

Reply via email to