On Sat, Jul 18, 2009 at 12:04 PM, Greg Stark <[email protected]> wrote:
> On Sat, Jul 18, 2009 at 6:35 PM, Alan Li<[email protected]> wrote:
> > Yeah, are you running into the same issue as well? I tried to figure out
> a
> > way around the O(n^2) behavior, but there were no existing direct
> > relationship between the child subpath and its associated AppendRelInfo.
> I
> > think an AppenRelInfo dynahash would work and just have it hash by the
> > child_relid.
>
> I don't see any place in my patch where I had to do anything like
> this. I do have to loop through all the appendrel elements and skip
> over the ones which don't belong to this appendrel which seems weird
> to me but it isn't n^2.
> Generating a normal Append node you're generating a brand new path for
> each child and attaching them to the append path. It looks like you're
> allowing the append rel to generate paths and then digging around
> looking at those paths. I wonder if that's the right approach.
>
>
> >> The other thing is it would be nice if we could avoid making separate
> >> subplans for each child and instead make one for the whole structure
> >> including the aggregate. It would at the very least make the explain
> >> output prettier, but I think it would avoid repeated execution of the
> >> aggregate in some cases as well.
> >
> > How would this plan look? I think the repeated execution of the
> aggregate
> > comes from having to process the output of each child. So it's O(n)
> > executions where n is the number of children, assuming each child has the
> > appropriate index for this optimization.
>
> No I just mean instead of generating
>
> InitPlan 1
> Limit
> Index Scan
> InitPlan 2
> Limit
> Index Scan
> Aggregate
> Append
> Result
> Result
>
> I think it would be better to generate this:
>
> InitPlan 1
> Aggregate
> Append
> Limit 1
> IndexScan
> Limit 1
> IndexScan
> Result
>
> The reason I think this is better is because if the subquery is
> executed multiple times it will just return the one precalculated
> result. In your plan it will have all the child minimums precalculated
> but will still have to re-execute the aggregate and append node.
> That's not terribly expensive but we might as well generate the
> simpler more efficient plan.
>
>
> > The other optimization that I thought was that the optimizer might use
> the
> > check constraints (in the range partitioning) on the column that I want
> to
> > do a MIN/MAX on. Then it'll only have to execute the subplan in one
> child
> > partition and the parent partition. But it was more of a wild idea on my
> > part, not sure if that's possible.
>
> Yes, I think this is the long-term goal. That's the whole "better
> representation of partitions" plotline. To do this efficiently the
> planner should know what the partition key is and what the
> partitioning structure is.
>
> In an ideal world we would be able to collapse the whole structure and
> eliminate the append relation entirely. To do that we need some way to
> indicate that the parent relation is provably empty. I had in mind to
> mark it as read-only and the statistics as authoritative since that
> seems more useful than just being able to mark it empty. I think there
> are a lot of other interesting things we could do with statistics if
> we knew they were authoritative for a read-only table. (The read-only
> property we would need here is very strong. There would have to be a
> vacuum freeze and moreover we would have to know that all tuples were
> successfully frozen.)
>
> --
> greg
> http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>
>
>
>
Attached is an updated patch that removes the O(n^2) behavior and the
awkwardness of optimizing the seqscan path as the plan was about to be
created. Now, the optimization is considered when appendrel is generating
the paths.
I also changed the plan as you had suggested. It now looks like this:
postgres=# explain select max(b) from foo_archive;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.22..1.23 rows=1 width=8)
-> Append (cost=0.00..1.13 rows=32 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_idx on foo_archive
(cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_01_idx on
foo_archive_2007_01_01 foo_archive (cost=0.00..2723.24 rows=86399 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_02_idx on
foo_archive_2007_01_02 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_03_idx on
foo_archive_2007_01_03 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_04_idx on
foo_archive_2007_01_04 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_05_idx on
foo_archive_2007_01_05 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_06_idx on
foo_archive_2007_01_06 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_07_idx on
foo_archive_2007_01_07 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_08_idx on
foo_archive_2007_01_08 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_09_idx on
foo_archive_2007_01_09 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_10_idx on
foo_archive_2007_01_10 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_11_idx on
foo_archive_2007_01_11 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_12_idx on
foo_archive_2007_01_12 foo_archive (cost=0.00..1568.27 rows=49601 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_13_idx on
foo_archive_2007_01_13 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_14_idx on
foo_archive_2007_01_14 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_15_idx on
foo_archive_2007_01_15 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_16_idx on
foo_archive_2007_01_16 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_17_idx on
foo_archive_2007_01_17 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_18_idx on
foo_archive_2007_01_18 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_19_idx on
foo_archive_2007_01_19 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_20_idx on
foo_archive_2007_01_20 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_21_idx on
foo_archive_2007_01_21 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_22_idx on
foo_archive_2007_01_22 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_23_idx on
foo_archive_2007_01_23 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_24_idx on
foo_archive_2007_01_24 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_25_idx on
foo_archive_2007_01_25 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_26_idx on
foo_archive_2007_01_26 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_27_idx on
foo_archive_2007_01_27 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_28_idx on
foo_archive_2007_01_28 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_29_idx on
foo_archive_2007_01_29 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_30_idx on
foo_archive_2007_01_30 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_31_idx on
foo_archive_2007_01_31 foo_archive (cost=0.00..73.35 rows=1940 width=8)
(66 rows)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b375902..0c839b9 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***************
*** 27,32 ****
--- 27,33 ----
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
+ #include "optimizer/planmain.h"
#include "optimizer/planner.h"
#include "optimizer/prep.h"
#include "optimizer/restrictinfo.h"
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 289,294 ****
--- 290,296 ----
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ bool can_optimize_minmax;
/*
* Initialize to compute size estimates for whole append relation.
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 309,314 ****
--- 311,319 ----
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
+
+ can_optimize_minmax = can_optimize_append_minmax(root);
+
/*
* Generate access paths for each member relation, and pick the cheapest
* path for each one.
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 427,433 ****
--- 432,454 ----
subpaths = list_concat(subpaths,
((AppendPath *) childpath)->subpaths);
else
+ {
+ if (can_optimize_minmax)
+ {
+ Path *minmax_path;
+ List *targetlist = (List *)
+ adjust_appendrel_attrs((Node *) root->parse->targetList,
+ appinfo);
+
+ minmax_path = optimize_append_minmax(root, targetlist, childrel,
+ childpath);
+
+ if (minmax_path)
+ childpath = minmax_path;
+ }
+
subpaths = lappend(subpaths, childpath);
+ }
/*
* Accumulate size information from each child.
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6e5c251..6643486 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 20,25 ****
--- 20,26 ----
#include <math.h>
#include "access/skey.h"
+ #include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
*************** create_append_plan(PlannerInfo *root, Ap
*** 546,551 ****
--- 547,554 ----
List *tlist = build_relation_tlist(best_path->path.parent);
List *subplans = NIL;
ListCell *subpaths;
+ bool can_optimize_minmax;
+ Node *limitCount;
/*
* It is possible for the subplans list to contain only one entry, or even
*************** create_append_plan(PlannerInfo *root, Ap
*** 566,577 ****
NULL);
}
/* Normal case with multiple subpaths */
foreach(subpaths, best_path->subpaths)
{
Path *subpath = (Path *) lfirst(subpaths);
! subplans = lappend(subplans, create_plan(root, subpath));
}
plan = make_append(subplans, false, tlist);
--- 569,591 ----
NULL);
}
+ can_optimize_minmax = can_optimize_append_minmax(root);
+ if (can_optimize_minmax)
+ limitCount = (Node *) makeConst(INT8OID, -1, sizeof(int64),
+ Int64GetDatum(1), false,
+ FLOAT8PASSBYVAL);
+
/* Normal case with multiple subpaths */
foreach(subpaths, best_path->subpaths)
{
Path *subpath = (Path *) lfirst(subpaths);
+ Plan *subplan = create_plan(root, subpath);
! /* apply the LIMIT 1 on top of the index scan */
! if (can_optimize_minmax && IsA(subplan, IndexScan))
! subplan = (Plan *) make_limit(subplan, NULL, limitCount, 0, 1);
!
! subplans = lappend(subplans, subplan);
}
plan = make_append(subplans, false, tlist);
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 34e9617..f7cdc5e 100644
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** optimize_minmax_aggregates(PlannerInfo *
*** 214,219 ****
--- 214,320 ----
}
/*
+ * can_optimize_append_minmax - check whether optimizing MIN/MAX via indexes
+ * can potentially be pushed down to child relations.
+ *
+ * This checks whether the query is of the form:
+ * 1. SELECT MAX(col) FROM tab WHERE ...;
+ * 2. SELECT MIN(col) FROM tab WHERE ...;
+ */
+ bool
+ can_optimize_append_minmax(PlannerInfo *root)
+ {
+ Query *parse = root->parse;
+ List *tlist = parse->targetList;
+ List *aggs_list;
+
+ /* Nothing to do if query has no aggregates */
+ if (!parse->hasAggs)
+ return false;
+
+ Assert(!parse->setOperations); /* shouldn't get here if a setop */
+ Assert(parse->rowMarks == NIL); /* nor if FOR UPDATE */
+
+ /*
+ * Reject unoptimizable cases.
+ *
+ * We don't handle GROUP BY or windowing, because our current
+ * implementations of grouping require looking at all the rows anyway, and
+ * so there's not much point in optimizing MIN/MAX.
+ */
+ if (parse->groupClause || parse->hasWindowFuncs)
+ return false;
+
+ /* We're only supporting a single MIN/MAX in the targetlist. */
+ if (list_length(tlist) > 1 || parse->havingQual)
+ return false;
+
+ aggs_list = NIL;
+ if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ return false;
+
+ if (list_length(aggs_list) > 1)
+ return false;
+
+ return true;
+ }
+
+ /*
+ * optimize_append_minmax - returns the IndexPath for a child relation as an
+ * optimization for MIN/MAX. See can_optimize_append_minmax() above. Returns
+ * NULL if no IndexPath is available or it is not profitable compared to the
+ * existing best_path.
+ */
+ Path *
+ optimize_append_minmax(PlannerInfo *root, List *tlist, RelOptInfo *rel,
+ Path *best_path)
+ {
+ List *aggs_list;
+ Cost total_cost;
+ MinMaxAggInfo *info;
+ Path agg_p;
+ Path minmax_agg_p;
+
+ /* Pass 1: Get the one MIN/MAX from the targetlist */
+ aggs_list = NIL;
+ if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ return NULL;
+
+ Assert(list_length(aggs_list) == 1);
+
+ /* Pass 2: see if MIN/MAX is optimizable */
+ info = (MinMaxAggInfo *) linitial(aggs_list);
+ if (!build_minmax_path(root, rel, info))
+ return NULL;
+ total_cost = info->pathcost;
+
+ /*
+ * Make the cost comparison.
+ *
+ * We are comparing the cost of the best_path that has to pass all of its
+ * output as input to the aggregate node against the one output tuple in
+ * the optimized path.
+ *
+ * Note that we don't include evaluation cost of the tlist here; this is
+ * OK since it isn't included in best_path's cost either, and should be
+ * the same in either case.
+ */
+ cost_agg(&agg_p, root, AGG_PLAIN, list_length(aggs_list),
+ 0, 0,
+ best_path->startup_cost, best_path->total_cost,
+ best_path->parent->rows);
+
+ /* only one input tuple expected from the minmax index path */
+ cost_agg(&minmax_agg_p, root, AGG_PLAIN, list_length(aggs_list),
+ 0, 0, 0, total_cost, 1);
+
+ if (minmax_agg_p.total_cost > agg_p.total_cost)
+ return NULL; /* too expensive */
+
+ return (Path *) info->path;
+ }
+
+ /*
* find_minmax_aggs_walker
* Recursively scan the Aggref nodes in an expression tree, and check
* that each one is a MIN/MAX aggregate. If so, build a list of the
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0dd2bcc..61dd0a5 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern void query_planner(PlannerInfo *r
*** 34,39 ****
--- 34,42 ----
*/
extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
Path *best_path);
+ extern bool can_optimize_append_minmax(PlannerInfo *root);
+ extern Path *optimize_append_minmax(PlannerInfo *root, List *tlist,
+ RelOptInfo *rel, Path *best_path);
/*
* prototypes for plan/createplan.c
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers