Re: [HACKERS] PATCH: index-only scans with partial indexes

Kyotaro HORIGUCHI Wed, 09 Mar 2016 00:31:35 -0800

Hello, thank you for the comments. The new v8 patch is attched.

At Tue, 08 Mar 2016 18:08:55 -0500, Tom Lane <t...@sss.pgh.pa.us> wrote in 
<21567.1457478...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> writes:
> > Hello, This is a (maybe) committer-ready patch of a Tomas
> > Vondra's project.
> 
> I think this needs quite a bit of work yet.  A few comments:


Not a few at all.

> * If we're going to pay the price of identifying implied restriction
> conditions in check_partial_indexes(), we should at least recoup some
> of that investment by not doing it again in create_indexscan_plan().

Moved a part of the check from create_indexscan_plan into
check_partial_indexes. 

 I noticed that we should avoid to exlude
clauses that contain mutable functions so I added that. But I
don't understand the reason for the following condition to refuse
clause pruning.

| rel->relid == root->parse->resultRelation


> * create_indexscan_plan() has this comment about what it's doing:
>      * We can also discard quals that are implied by a partial index's
>      * predicate, but only in a plain SELECT; when scanning a target relation
>      * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
>      * plan so that they'll be properly rechecked by EvalPlanQual testing.
> I believe that that problem applies for this optimization as well,
> and thus that you can only remove implied quals in plain SELECT.
> At least, if there's some reason why that problem does *not* apply,
> there had darn well better be a comment explaining why it's safe.

This is done in check_partial_indexes using parse_rowmark.

The problem haven't realized with the previous patch because it
(accidentially) used rel->baserestirictinfo, not
index->indrinfos for scan_clauses in create_scan_plan.

But the way how create_scan_plan gets scan_clauses seems
bad. I haven't have any clean idea to deal with it.

> * Adding indexrinfos to IndexPath seems unnecessary, since that struct
> already has the "index" pointer --- you can just get the field out of the
> IndexOptInfo when you need it.  If you insist on having the extra field,
> this patch is short of the threshold for correctness on adding fields to
> paths.  It missed _outIndexPath for instance.

Sorry for the stupid code. I don't insist to keep it. Removed.

> * The additional #include in costsize.c has no apparent reason.

Thank you for pointing out. Removed.

> * The changes in cost_index() really ought to come with some change
> in the associated comments.

I tried to add a comment but it doesn't seem clear.

> * Personally I'd not change the signature of
> match_restriction_clauses_to_index; that's just code churn that
> may have to get undone someday.

No problem, reverted.

> * The block comment in check_index_only needs more thought than this,
> as the phrase "The same is true" is now invalid; the "same" it refers
> to *isn't* the same anymore.

Maybe I took this "the same" wrongly. Tried to fix it but I'm not
confident on the result.

> * I'm not too thrilled with injecting the initialization of
> index->indrinfos into the initial loop in check_partial_indexes().
> If it stays there, I'd certainly expect the comment ahead of the
> loop to be changed to have something to do with reality.  But can't
> we find some more-appropriate place to initialize it?  Like maybe
> where the IndexOptInfo is first created?  I would not really expect
> check_partial_indexes() to have side-effects on non-partial indexes.

Mmm. That is quote right in general. IndexOptInfo is created in
get_relation_info() but baserestrictinfo has not been fixed at
the point. It is fixed as late as set_append_rel_size, almost
just before set_rel_size, and just before the
check_partial_indexes. But initializing indrinfos as a
side-effect of check_partial_indexes is not good as you pointed.
But it is called in two ways, set_tablesample_rel_size and
set_plain_rel_size. So the only possible position of that other
than check_partial_indexes is set_rel_size.


> * I think the double loop in check_partial_indexes() is too cute by half.
> I'd be inclined to just build the replacement list unconditionally while
> we do the predicate_implied_by() tests.  Those are expensive enough that
> saving one lappend per implication-test is a useless optimization,
> especially if it requires code as contorted and bug-prone as this.

Ok, I removed the too cute part and added comment mentioning the
reason for the unconditional replacement.

> * The comment added to IndexOptInfo is not very adequate, and not spelled
> correctly either.  There's a block comment you should be adding a para to
> (probably take the text you added for struct IndexPath).  

I understand that you are mentioning here.

+  List     *indrinfos;    /* baseristrict info which are not implied by
+                           * indpred */

I rewritten to make sense, maybe.

> And again,
> there is more work to do to add a field to such a struct, eg outfuncs.c.
> Usually a good way to find all the places to touch is to grep for some of
> the existing field names in the struct.

Sorry, I just forgot of that. (In spite that I myself give such
kind of comments..) Yeah, I love find-grep on emacs.

By the way, I found this comment in copyfuncs.c but I couldn't
find the "subsidiary structs".

|  * We don't support copying RelOptInfo, IndexOptInfo, or Path nodes.
|  * There are some subsidiary structs that are useful to copy, though.

Finally, all I added for this was one line in _outIndexOptInfo.

> * I don't much care for the field name "indrinfos"; it's neither very
> readable nor descriptive.  Don't have a better suggestion right now
> though.

I agree with you. I didn't like the name so I rethought that. I
followed the seeming rule that prefixing with 'ind' to the field
name, but it is not for index, but for the parent relation. So I
renamed it as "baserestrictinfo" in this version.

> * Not sure if new regression test cases would be appropriate.  The changes
> in the existing cases seem a bit unfortunate actually; I'm afraid that
> this may be defeating the original intent of those tests.

Only aggregates.out is modifed in this patch. The comment for the
test says that,

> --
> -- Test cases that should be optimized into indexscans instead of
> -- the generic aggregate implementation.
...
> -- try it on an inheritance tree
...
> explain (costs off)
>   select min(f1), max(f1) from minmaxtest;

and 

> -- DISTINCT doesn't do anything useful here, but it shouldn't fail
> explain (costs off)
>   select distinct min(f1), max(f1) from minmaxtest;

Utterly no problem from the point of the comment. Although this
patch removes "Index Cond"s for the index minmaxtest3i, it is
simplly caused by a index predicate on the index, which is the
very result of this patch.

> I'm setting this back to Waiting on Author.

Attached the new version v8.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From c3d8d0a10097ffe33cd46c1127924e28fd3ff7bd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Wed, 9 Mar 2016 16:37:00 +0900
Subject: [PATCH] patch v8

---
 src/backend/nodes/outfuncs.c             |  1 +
 src/backend/optimizer/path/allpaths.c    | 18 ++++++++++
 src/backend/optimizer/path/costsize.c    | 14 +++++---
 src/backend/optimizer/path/indxpath.c    | 57 +++++++++++++++++++++++++-------
 src/backend/optimizer/plan/createplan.c  | 41 ++++++++++++++---------
 src/backend/optimizer/util/plancat.c     |  6 ++++
 src/include/nodes/relation.h             |  6 ++--
 src/test/regress/expected/aggregates.out |  8 ++---
 8 files changed, 110 insertions(+), 41 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3119b9e..ebe503d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2145,6 +2145,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	WRITE_OID_FIELD(relam);
 	/* indexprs is redundant since we print indextlist */
 	WRITE_NODE_FIELD(indpred);
+	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_NODE_FIELD(indextlist);
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a08c248..628f7bd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -334,6 +334,24 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		switch (rel->rtekind)
 		{
 			case RTE_RELATION:
+				/*
+				 * For index scans, some clauses in baserestrictinfo can be
+				 * culled according to idnex definitions. So we are to
+				 * retrieve baserestrictinfo for indexscans from
+				 * IndexOptInfo. The baserestrictinfo has not been fixed until
+				 * just before here so we initialize it here.
+				 */
+				if (rel->indexlist && rel->baserestrictinfo)
+				{
+					ListCell *lc;
+
+					foreach (lc, rel->indexlist)
+					{
+						IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+						index->baserestrictinfo = rel->baserestrictinfo;
+					}
+				}
+				
 				if (rte->relkind == RELKIND_FOREIGN_TABLE)
 				{
 					/* Foreign table */
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5350329..46eadbe 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -427,22 +427,26 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count)
 	/*
 	 * Mark the path with the correct row estimate, and identify which quals
 	 * will need to be enforced as qpquals.
+	 *
+	 * usually, qpquals come from the rel's restriction clauses and
+	 * ppi_clauses.  For indexscan paths on partial indexes, the restriction
+	 * clauses may be a subset of RelOptInfo's baserestrictinfo so we should
+	 * ask IndexOptInfo for restriction clauses here.  See
+	 * check_partial_indexes() for the details.
 	 */
 	if (path->path.param_info)
 	{
 		path->path.rows = path->path.param_info->ppi_rows;
-		/* qpquals come from the rel's restriction clauses and ppi_clauses */
 		qpquals = list_concat(
-					   extract_nonindex_conditions(baserel->baserestrictinfo,
-												   path->indexquals),
+			  extract_nonindex_conditions(path->indexinfo->baserestrictinfo,
+										  path->indexquals),
 			  extract_nonindex_conditions(path->path.param_info->ppi_clauses,
 										  path->indexquals));
 	}
 	else
 	{
 		path->path.rows = baserel->rows;
-		/* qpquals come from just the rel's restriction clauses */
-		qpquals = extract_nonindex_conditions(baserel->baserestrictinfo,
+		qpquals = extract_nonindex_conditions(path->indexinfo->baserestrictinfo,
 											  path->indexquals);
 	}
 
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ddb4ca5..2aa06d8 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,6 +32,7 @@
 #include "optimizer/predtest.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/var.h"
+#include "parser/parsetree.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/lsyscache.h"
@@ -1801,13 +1802,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	 * Check that all needed attributes of the relation are available from the
 	 * index.
 	 *
-	 * XXX this is overly conservative for partial indexes, since we will
-	 * consider attributes involved in the index predicate as required even
-	 * though the predicate won't need to be checked at runtime.  (The same is
-	 * true for attributes used only in index quals, if we are certain that
-	 * the index is not lossy.)  However, it would be quite expensive to
-	 * determine that accurately at this point, so for now we take the easy
-	 * way out.
+	 * For partial indexes we won't consider attributes involved in clauses
+	 * implied by the index predicate, as those won't be needed at runtime.
+	 *
+	 * XXX Attributes used only in index quals are also removable if we are
+	 * certain that the index is not lossy. However, it would be quite
+	 * expensive to determine that accurately at this point, so for now we
+	 * take the easy way out.
 	 */
 
 	/*
@@ -1817,8 +1818,8 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	 */
 	pull_varattnos((Node *) rel->reltarget.exprs, rel->relid, &attrs_used);
 
-	/* Add all the attributes used by restriction clauses. */
-	foreach(lc, rel->baserestrictinfo)
+	/* Add all the attributes used by index restriction clauses. */
+	foreach(lc, index->baserestrictinfo)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
@@ -2023,7 +2024,7 @@ static void
 match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index,
 								   IndexClauseSet *clauseset)
 {
-	match_clauses_to_index(index, rel->baserestrictinfo, clauseset);
+	match_clauses_to_index(index, index->baserestrictinfo, clauseset);
 }
 
 /*
@@ -2696,7 +2697,6 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
 			continue;			/* don't repeat work if already proven OK */
 
 		have_partial = true;
-		break;
 	}
 	if (!have_partial)
 		return;
@@ -2743,10 +2743,14 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
 														 otherrels,
 														 rel));
 
-	/* Now try to prove each index predicate true */
+	/*
+	 * Now try to prove each index predicate true. We'll also tweak the
+	 * index restrictions to exclude clauses implied by the predicate.
+	 */
 	foreach(lc, rel->indexlist)
 	{
 		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+		ListCell *lcr;
 
 		if (index->indpred == NIL)
 			continue;			/* ignore non-partial indexes */
@@ -2755,6 +2759,35 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
 			continue;			/* don't repeat work if already proven OK */
 
 		index->predOK = predicate_implied_by(index->indpred, clauselist);
+
+		/*
+		 * Update restrictinfo for this index by excluding clauses that are
+		 * implied by the index predicates. We first quickly check if we can
+		 * safely remove clauses.
+		 */
+
+		if (rel->relid == root->parse->resultRelation ||
+			get_plan_rowmark(root->rowMarks, rel->relid) != NULL)
+			continue;
+
+		/*
+		 * We could avoid re-creating restrictions list if not necessary but
+		 * the cost saved by such optimization is expected relavively too
+		 * low. So unconditionally replace it even if it is virtually
+		 * unchanged as the result.
+		 */
+		index->baserestrictinfo = NIL;
+		foreach (lcr, rel->baserestrictinfo)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr);
+
+			/* add to index clauses if initial or not implied */
+			if (contain_mutable_functions((Node *) rinfo->clause) ||
+				!predicate_implied_by(list_make1(rinfo->clause),
+									 index->indpred))
+				index->baserestrictinfo =
+					lappend(index->baserestrictinfo, rinfo);
+		}
 	}
 }
 
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 88c7279..2e906e4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -70,6 +70,7 @@
 
 static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
 					int flags);
+static List *get_baserestrictinfo(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path,
 				 int flags);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
@@ -488,6 +489,19 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 	return plan;
 }
 
+static List *
+get_baserestrictinfo(PlannerInfo *root, Path *best_path)
+{
+	switch (best_path->pathtype)
+	{
+	case T_IndexScan:
+	case T_IndexOnlyScan:
+		return ((IndexPath*)best_path)->indexinfo->baserestrictinfo;
+	default:
+		return best_path->parent->baserestrictinfo;
+	}
+}
+
 /*
  * create_scan_plan
  *	 Create a scan plan for the parent relation of 'best_path'.
@@ -504,9 +518,11 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 	/*
 	 * Extract the relevant restriction clauses from the parent relation. The
 	 * executor must apply all these restrictions during the scan, except for
-	 * pseudoconstants which we'll take care of below.
+	 * pseudoconstants which we'll take care of below. Some types of paths
+	 * have its own baserestrictinfo which may different from that of the
+	 * parent relations.
 	 */
-	scan_clauses = rel->baserestrictinfo;
+	scan_clauses = get_baserestrictinfo(root, best_path);
 
 	/*
 	 * If this is a parameterized scan, we also need to enforce all the join
@@ -2363,21 +2379,14 @@ create_indexscan_plan(PlannerInfo *root,
 			continue;			/* simple duplicate */
 		if (is_redundant_derived_clause(rinfo, indexquals))
 			continue;			/* derived from same EquivalenceClass */
-		if (!contain_mutable_functions((Node *) rinfo->clause))
-		{
-			List	   *clausel = list_make1(rinfo->clause);
+		/*
+		 * scan_clauses don't contain clauses implied by index predicate. See
+		 * check_partial_indexes() for detail.
+		 */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), indexquals))
+			continue;		/* provably implied by indexquals */
 
-			if (predicate_implied_by(clausel, indexquals))
-				continue;		/* provably implied by indexquals */
-			if (best_path->indexinfo->indpred)
-			{
-				if (baserelid != root->parse->resultRelation &&
-					get_plan_rowmark(root->rowMarks, baserelid) == NULL)
-					if (predicate_implied_by(clausel,
-											 best_path->indexinfo->indpred))
-						continue;		/* implied by index predicate */
-			}
-		}
 		qpqual = lappend(qpqual, rinfo);
 	}
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ad715bb..da8b24f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -331,6 +331,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			 */
 			info->indexprs = RelationGetIndexExpressions(indexRelation);
 			info->indpred = RelationGetIndexPredicate(indexRelation);
+
+			/*
+			 * indrelinfos is to have maybe-modified baserestrictinfo but the
+			 * baserestrictinfo is not fixed at this point.
+			 */
+			info->baserestrictinfo = NULL;
 			if (info->indexprs && varno != 1)
 				ChangeVarNodes((Node *) info->indexprs, 1, varno, 0);
 			if (info->indpred && varno != 1)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 098a486..d262a89 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -615,9 +615,11 @@ typedef struct IndexOptInfo
 
 	List	   *indexprs;		/* expressions for non-simple index columns */
 	List	   *indpred;		/* predicate if a partial index, else NIL */
-
+	List	   *baserestrictinfo; /* a list of RestrictInfo nodes from the
+                                 * query's WHERE or JOIN conditions, maybe
+                                 * excluding those implied by the index
+                                 * predicate */
 	List	   *indextlist;		/* targetlist representing index columns */
-
 	bool		predOK;			/* true if predicate matches query */
 	bool		unique;			/* true if a unique index */
 	bool		immediate;		/* is uniqueness enforced immediately? */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 601bdb4..3ff6691 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -780,7 +780,6 @@ explain (costs off)
                  ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest3i on minmaxtest3
-                       Index Cond: (f1 IS NOT NULL)
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Merge Append
@@ -792,8 +791,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest2_1
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest3_1
-                       Index Cond: (f1 IS NOT NULL)
-(25 rows)
+(23 rows)
 
 select min(f1), max(f1) from minmaxtest;
  min | max 
@@ -818,7 +816,6 @@ explain (costs off)
                  ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan using minmaxtest3i on minmaxtest3
-                       Index Cond: (f1 IS NOT NULL)
    InitPlan 2 (returns $1)
      ->  Limit
            ->  Merge Append
@@ -830,11 +827,10 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest2_1
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest3_1
-                       Index Cond: (f1 IS NOT NULL)
    ->  Sort
          Sort Key: ($0), ($1)
          ->  Result
-(28 rows)
+(26 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
-- 
1.8.3.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: index-only scans with partial indexes

Reply via email to