Hello, thank you for the comments. The new v8 patch is attched. At Tue, 08 Mar 2016 18:08:55 -0500, Tom Lane <t...@sss.pgh.pa.us> wrote in <21567.1457478...@sss.pgh.pa.us> > Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> writes: > > Hello, This is a (maybe) committer-ready patch of a Tomas > > Vondra's project. > > I think this needs quite a bit of work yet. A few comments:
Not a few at all. > * If we're going to pay the price of identifying implied restriction > conditions in check_partial_indexes(), we should at least recoup some > of that investment by not doing it again in create_indexscan_plan(). Moved a part of the check from create_indexscan_plan into check_partial_indexes. I noticed that we should avoid to exlude clauses that contain mutable functions so I added that. But I don't understand the reason for the following condition to refuse clause pruning. | rel->relid == root->parse->resultRelation > * create_indexscan_plan() has this comment about what it's doing: > * We can also discard quals that are implied by a partial index's > * predicate, but only in a plain SELECT; when scanning a target relation > * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the > * plan so that they'll be properly rechecked by EvalPlanQual testing. > I believe that that problem applies for this optimization as well, > and thus that you can only remove implied quals in plain SELECT. > At least, if there's some reason why that problem does *not* apply, > there had darn well better be a comment explaining why it's safe. This is done in check_partial_indexes using parse_rowmark. The problem haven't realized with the previous patch because it (accidentially) used rel->baserestirictinfo, not index->indrinfos for scan_clauses in create_scan_plan. But the way how create_scan_plan gets scan_clauses seems bad. I haven't have any clean idea to deal with it. > * Adding indexrinfos to IndexPath seems unnecessary, since that struct > already has the "index" pointer --- you can just get the field out of the > IndexOptInfo when you need it. If you insist on having the extra field, > this patch is short of the threshold for correctness on adding fields to > paths. It missed _outIndexPath for instance. Sorry for the stupid code. I don't insist to keep it. Removed. > * The additional #include in costsize.c has no apparent reason. Thank you for pointing out. Removed. > * The changes in cost_index() really ought to come with some change > in the associated comments. I tried to add a comment but it doesn't seem clear. > * Personally I'd not change the signature of > match_restriction_clauses_to_index; that's just code churn that > may have to get undone someday. No problem, reverted. > * The block comment in check_index_only needs more thought than this, > as the phrase "The same is true" is now invalid; the "same" it refers > to *isn't* the same anymore. Maybe I took this "the same" wrongly. Tried to fix it but I'm not confident on the result. > * I'm not too thrilled with injecting the initialization of > index->indrinfos into the initial loop in check_partial_indexes(). > If it stays there, I'd certainly expect the comment ahead of the > loop to be changed to have something to do with reality. But can't > we find some more-appropriate place to initialize it? Like maybe > where the IndexOptInfo is first created? I would not really expect > check_partial_indexes() to have side-effects on non-partial indexes. Mmm. That is quote right in general. IndexOptInfo is created in get_relation_info() but baserestrictinfo has not been fixed at the point. It is fixed as late as set_append_rel_size, almost just before set_rel_size, and just before the check_partial_indexes. But initializing indrinfos as a side-effect of check_partial_indexes is not good as you pointed. But it is called in two ways, set_tablesample_rel_size and set_plain_rel_size. So the only possible position of that other than check_partial_indexes is set_rel_size. > * I think the double loop in check_partial_indexes() is too cute by half. > I'd be inclined to just build the replacement list unconditionally while > we do the predicate_implied_by() tests. Those are expensive enough that > saving one lappend per implication-test is a useless optimization, > especially if it requires code as contorted and bug-prone as this. Ok, I removed the too cute part and added comment mentioning the reason for the unconditional replacement. > * The comment added to IndexOptInfo is not very adequate, and not spelled > correctly either. There's a block comment you should be adding a para to > (probably take the text you added for struct IndexPath). I understand that you are mentioning here. + List *indrinfos; /* baseristrict info which are not implied by + * indpred */ I rewritten to make sense, maybe. > And again, > there is more work to do to add a field to such a struct, eg outfuncs.c. > Usually a good way to find all the places to touch is to grep for some of > the existing field names in the struct. Sorry, I just forgot of that. (In spite that I myself give such kind of comments..) Yeah, I love find-grep on emacs. By the way, I found this comment in copyfuncs.c but I couldn't find the "subsidiary structs". | * We don't support copying RelOptInfo, IndexOptInfo, or Path nodes. | * There are some subsidiary structs that are useful to copy, though. Finally, all I added for this was one line in _outIndexOptInfo. > * I don't much care for the field name "indrinfos"; it's neither very > readable nor descriptive. Don't have a better suggestion right now > though. I agree with you. I didn't like the name so I rethought that. I followed the seeming rule that prefixing with 'ind' to the field name, but it is not for index, but for the parent relation. So I renamed it as "baserestrictinfo" in this version. > * Not sure if new regression test cases would be appropriate. The changes > in the existing cases seem a bit unfortunate actually; I'm afraid that > this may be defeating the original intent of those tests. Only aggregates.out is modifed in this patch. The comment for the test says that, > -- > -- Test cases that should be optimized into indexscans instead of > -- the generic aggregate implementation. ... > -- try it on an inheritance tree ... > explain (costs off) > select min(f1), max(f1) from minmaxtest; and > -- DISTINCT doesn't do anything useful here, but it shouldn't fail > explain (costs off) > select distinct min(f1), max(f1) from minmaxtest; Utterly no problem from the point of the comment. Although this patch removes "Index Cond"s for the index minmaxtest3i, it is simplly caused by a index predicate on the index, which is the very result of this patch. > I'm setting this back to Waiting on Author. Attached the new version v8. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
>From c3d8d0a10097ffe33cd46c1127924e28fd3ff7bd Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Wed, 9 Mar 2016 16:37:00 +0900 Subject: [PATCH] patch v8 --- src/backend/nodes/outfuncs.c | 1 + src/backend/optimizer/path/allpaths.c | 18 ++++++++++ src/backend/optimizer/path/costsize.c | 14 +++++--- src/backend/optimizer/path/indxpath.c | 57 +++++++++++++++++++++++++------- src/backend/optimizer/plan/createplan.c | 41 ++++++++++++++--------- src/backend/optimizer/util/plancat.c | 6 ++++ src/include/nodes/relation.h | 6 ++-- src/test/regress/expected/aggregates.out | 8 ++--- 8 files changed, 110 insertions(+), 41 deletions(-) diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 3119b9e..ebe503d 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2145,6 +2145,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node) WRITE_OID_FIELD(relam); /* indexprs is redundant since we print indextlist */ WRITE_NODE_FIELD(indpred); + WRITE_NODE_FIELD(baserestrictinfo); WRITE_NODE_FIELD(indextlist); WRITE_BOOL_FIELD(predOK); WRITE_BOOL_FIELD(unique); diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index a08c248..628f7bd 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -334,6 +334,24 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel, switch (rel->rtekind) { case RTE_RELATION: + /* + * For index scans, some clauses in baserestrictinfo can be + * culled according to idnex definitions. So we are to + * retrieve baserestrictinfo for indexscans from + * IndexOptInfo. The baserestrictinfo has not been fixed until + * just before here so we initialize it here. + */ + if (rel->indexlist && rel->baserestrictinfo) + { + ListCell *lc; + + foreach (lc, rel->indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + index->baserestrictinfo = rel->baserestrictinfo; + } + } + if (rte->relkind == RELKIND_FOREIGN_TABLE) { /* Foreign table */ diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 5350329..46eadbe 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -427,22 +427,26 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count) /* * Mark the path with the correct row estimate, and identify which quals * will need to be enforced as qpquals. + * + * usually, qpquals come from the rel's restriction clauses and + * ppi_clauses. For indexscan paths on partial indexes, the restriction + * clauses may be a subset of RelOptInfo's baserestrictinfo so we should + * ask IndexOptInfo for restriction clauses here. See + * check_partial_indexes() for the details. */ if (path->path.param_info) { path->path.rows = path->path.param_info->ppi_rows; - /* qpquals come from the rel's restriction clauses and ppi_clauses */ qpquals = list_concat( - extract_nonindex_conditions(baserel->baserestrictinfo, - path->indexquals), + extract_nonindex_conditions(path->indexinfo->baserestrictinfo, + path->indexquals), extract_nonindex_conditions(path->path.param_info->ppi_clauses, path->indexquals)); } else { path->path.rows = baserel->rows; - /* qpquals come from just the rel's restriction clauses */ - qpquals = extract_nonindex_conditions(baserel->baserestrictinfo, + qpquals = extract_nonindex_conditions(path->indexinfo->baserestrictinfo, path->indexquals); } diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index ddb4ca5..2aa06d8 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -32,6 +32,7 @@ #include "optimizer/predtest.h" #include "optimizer/restrictinfo.h" #include "optimizer/var.h" +#include "parser/parsetree.h" #include "utils/builtins.h" #include "utils/bytea.h" #include "utils/lsyscache.h" @@ -1801,13 +1802,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index) * Check that all needed attributes of the relation are available from the * index. * - * XXX this is overly conservative for partial indexes, since we will - * consider attributes involved in the index predicate as required even - * though the predicate won't need to be checked at runtime. (The same is - * true for attributes used only in index quals, if we are certain that - * the index is not lossy.) However, it would be quite expensive to - * determine that accurately at this point, so for now we take the easy - * way out. + * For partial indexes we won't consider attributes involved in clauses + * implied by the index predicate, as those won't be needed at runtime. + * + * XXX Attributes used only in index quals are also removable if we are + * certain that the index is not lossy. However, it would be quite + * expensive to determine that accurately at this point, so for now we + * take the easy way out. */ /* @@ -1817,8 +1818,8 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index) */ pull_varattnos((Node *) rel->reltarget.exprs, rel->relid, &attrs_used); - /* Add all the attributes used by restriction clauses. */ - foreach(lc, rel->baserestrictinfo) + /* Add all the attributes used by index restriction clauses. */ + foreach(lc, index->baserestrictinfo) { RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); @@ -2023,7 +2024,7 @@ static void match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index, IndexClauseSet *clauseset) { - match_clauses_to_index(index, rel->baserestrictinfo, clauseset); + match_clauses_to_index(index, index->baserestrictinfo, clauseset); } /* @@ -2696,7 +2697,6 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel) continue; /* don't repeat work if already proven OK */ have_partial = true; - break; } if (!have_partial) return; @@ -2743,10 +2743,14 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel) otherrels, rel)); - /* Now try to prove each index predicate true */ + /* + * Now try to prove each index predicate true. We'll also tweak the + * index restrictions to exclude clauses implied by the predicate. + */ foreach(lc, rel->indexlist) { IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + ListCell *lcr; if (index->indpred == NIL) continue; /* ignore non-partial indexes */ @@ -2755,6 +2759,35 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel) continue; /* don't repeat work if already proven OK */ index->predOK = predicate_implied_by(index->indpred, clauselist); + + /* + * Update restrictinfo for this index by excluding clauses that are + * implied by the index predicates. We first quickly check if we can + * safely remove clauses. + */ + + if (rel->relid == root->parse->resultRelation || + get_plan_rowmark(root->rowMarks, rel->relid) != NULL) + continue; + + /* + * We could avoid re-creating restrictions list if not necessary but + * the cost saved by such optimization is expected relavively too + * low. So unconditionally replace it even if it is virtually + * unchanged as the result. + */ + index->baserestrictinfo = NIL; + foreach (lcr, rel->baserestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr); + + /* add to index clauses if initial or not implied */ + if (contain_mutable_functions((Node *) rinfo->clause) || + !predicate_implied_by(list_make1(rinfo->clause), + index->indpred)) + index->baserestrictinfo = + lappend(index->baserestrictinfo, rinfo); + } } } diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 88c7279..2e906e4 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -70,6 +70,7 @@ static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path, int flags); +static List *get_baserestrictinfo(PlannerInfo *root, Path *best_path); static Plan *create_scan_plan(PlannerInfo *root, Path *best_path, int flags); static List *build_path_tlist(PlannerInfo *root, Path *path); @@ -488,6 +489,19 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags) return plan; } +static List * +get_baserestrictinfo(PlannerInfo *root, Path *best_path) +{ + switch (best_path->pathtype) + { + case T_IndexScan: + case T_IndexOnlyScan: + return ((IndexPath*)best_path)->indexinfo->baserestrictinfo; + default: + return best_path->parent->baserestrictinfo; + } +} + /* * create_scan_plan * Create a scan plan for the parent relation of 'best_path'. @@ -504,9 +518,11 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags) /* * Extract the relevant restriction clauses from the parent relation. The * executor must apply all these restrictions during the scan, except for - * pseudoconstants which we'll take care of below. + * pseudoconstants which we'll take care of below. Some types of paths + * have its own baserestrictinfo which may different from that of the + * parent relations. */ - scan_clauses = rel->baserestrictinfo; + scan_clauses = get_baserestrictinfo(root, best_path); /* * If this is a parameterized scan, we also need to enforce all the join @@ -2363,21 +2379,14 @@ create_indexscan_plan(PlannerInfo *root, continue; /* simple duplicate */ if (is_redundant_derived_clause(rinfo, indexquals)) continue; /* derived from same EquivalenceClass */ - if (!contain_mutable_functions((Node *) rinfo->clause)) - { - List *clausel = list_make1(rinfo->clause); + /* + * scan_clauses don't contain clauses implied by index predicate. See + * check_partial_indexes() for detail. + */ + if (!contain_mutable_functions((Node *) rinfo->clause) && + predicate_implied_by(list_make1(rinfo->clause), indexquals)) + continue; /* provably implied by indexquals */ - if (predicate_implied_by(clausel, indexquals)) - continue; /* provably implied by indexquals */ - if (best_path->indexinfo->indpred) - { - if (baserelid != root->parse->resultRelation && - get_plan_rowmark(root->rowMarks, baserelid) == NULL) - if (predicate_implied_by(clausel, - best_path->indexinfo->indpred)) - continue; /* implied by index predicate */ - } - } qpqual = lappend(qpqual, rinfo); } diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index ad715bb..da8b24f 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -331,6 +331,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, */ info->indexprs = RelationGetIndexExpressions(indexRelation); info->indpred = RelationGetIndexPredicate(indexRelation); + + /* + * indrelinfos is to have maybe-modified baserestrictinfo but the + * baserestrictinfo is not fixed at this point. + */ + info->baserestrictinfo = NULL; if (info->indexprs && varno != 1) ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); if (info->indpred && varno != 1) diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 098a486..d262a89 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -615,9 +615,11 @@ typedef struct IndexOptInfo List *indexprs; /* expressions for non-simple index columns */ List *indpred; /* predicate if a partial index, else NIL */ - + List *baserestrictinfo; /* a list of RestrictInfo nodes from the + * query's WHERE or JOIN conditions, maybe + * excluding those implied by the index + * predicate */ List *indextlist; /* targetlist representing index columns */ - bool predOK; /* true if predicate matches query */ bool unique; /* true if a unique index */ bool immediate; /* is uniqueness enforced immediately? */ diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index 601bdb4..3ff6691 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -780,7 +780,6 @@ explain (costs off) -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 Index Cond: (f1 IS NOT NULL) -> Index Only Scan using minmaxtest3i on minmaxtest3 - Index Cond: (f1 IS NOT NULL) InitPlan 2 (returns $1) -> Limit -> Merge Append @@ -792,8 +791,7 @@ explain (costs off) -> Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest2_1 Index Cond: (f1 IS NOT NULL) -> Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest3_1 - Index Cond: (f1 IS NOT NULL) -(25 rows) +(23 rows) select min(f1), max(f1) from minmaxtest; min | max @@ -818,7 +816,6 @@ explain (costs off) -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 Index Cond: (f1 IS NOT NULL) -> Index Only Scan using minmaxtest3i on minmaxtest3 - Index Cond: (f1 IS NOT NULL) InitPlan 2 (returns $1) -> Limit -> Merge Append @@ -830,11 +827,10 @@ explain (costs off) -> Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest2_1 Index Cond: (f1 IS NOT NULL) -> Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest3_1 - Index Cond: (f1 IS NOT NULL) -> Sort Sort Key: ($0), ($1) -> Result -(28 rows) +(26 rows) select distinct min(f1), max(f1) from minmaxtest; min | max -- 1.8.3.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers