On 13.09.2017 14:00, Simon Riggs wrote:
On 13 September 2017 at 11:30, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:

The only reason of all this discussion about terms is that I need to choose
name for correspondent index option.
Simon think that we do not need this option at all. In this case we should
not worry about right term.
 From my point of view, "projection" is quite clear notion and not only for
mathematics. It is also widely used in IT and especially in DBMSes.
If we do have an option it won't be using fancy mathematical
terminology at all, it would be described in terms of its function,
e.g. recheck_on_update

Yes, I'd rather not have an option at all, just some simple code with
useful effect, like we have in many other places.

Attached please find new version of projection functional index optimization patch. I have implemented very simple autotune strategy: now I use table statistic to compare total number of updates with number of hot updates. If fraction of hot updates is relatively small, then there is no sense to spend time performing extra evaluation of index expression and comparing its old and new values.
Right now the formula is the following:

#define MIN_UPDATES_THRESHOLD 10
#define HOT_RATIO_THRESHOLD   2

        if (stat->tuples_updated > MIN_UPDATES_THRESHOLD
&& stat->tuples_updated > stat->tuples_hot_updated*HOT_RATIO_THRESHOLD)
        {
/* If percent of hot updates is small, then disable projection index function * optimization to eliminate overhead of extra index expression evaluations.
             */
            ii->ii_Projection = false;
        }

This threshold values are pulled out of a hat: I am not sure if this heuristic is right.
I will be please to get feedback if such approach to autotune is promising.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 83ee7d3..52189ac 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -294,8 +294,33 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    <para>
     The optional <literal>WITH</> clause specifies <firstterm>storage
     parameters</> for the index.  Each index method has its own set of allowed
-    storage parameters.  The B-tree, hash, GiST and SP-GiST index methods all
-    accept this parameter:
+    storage parameters. All indexes accept the following parameter:
+   </para>
+
+   <variablelist>
+   <varlistentry>
+    <term><literal>projection</></term>
+    <listitem>
+     <para>
+       Functional index is based on on projection function: function which extract subset of its argument.
+       In mathematic such functions are called non-injective. For injective function if any attribute used in the indexed
+       expression is changed, then value of index expression is also changed. So to check that index is affected by the
+       update, it is enough to check the set of changed fields. By default this parameters is assigned true value and function is considered
+       as non-injective.
+       In this case change of any of indexed key doesn't mean that value of the function is changed. For example, for
+       the expression expression<literal>(bookinfo-&gt;&gt;'isbn')</literal> defined
+       for column of JSON type is changed only when ISBN is changed, which rarely happen. The same is true for most
+       functional indexes. For non-injective functions, Postgres compares values of indexed expression for old and updated tuple and updates
+       index only when function results are different. It allows to eliminate index update and use HOT update.
+       But there are extra evaluations of the functions. So if function is expensive or probability that change of indexed column will not effect
+       the function value is small, then marking index as projection may increase update speed.
+    </para>
+    </listitem>
+   </varlistentry>
+   </variablelist>
+
+   <para>
+     The B-tree, hash, GiST and SP-GiST index methods all accept this parameter:
    </para>
 
    <variablelist>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index ec10762..b73165f 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -130,6 +130,15 @@ static relopt_bool boolRelOpts[] =
 	},
 	{
 		{
+			"projection",
+			"Evaluate functional index expression on update to check if its values is changed",
+			RELOPT_KIND_INDEX,
+			AccessExclusiveLock
+		},
+		true
+	},
+	{
+		{
 			"security_barrier",
 			"View acts as a row security barrier",
 			RELOPT_KIND_VIEW,
@@ -1301,7 +1310,7 @@ fillRelOptions(void *rdopts, Size basesize,
 				break;
 			}
 		}
-		if (validate && !found)
+		if (validate && !found && options[i].gen->kinds != RELOPT_KIND_INDEX)
 			elog(ERROR, "reloption \"%s\" not found in parse table",
 				 options[i].gen->name);
 	}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e29c5ad..05e372f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -56,6 +56,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/index.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -74,7 +75,9 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
-
+#include "utils/memutils.h"
+#include "nodes/execnodes.h"
+#include "executor/executor.h"
 
 /* GUC variable */
 bool		synchronize_seqscans = true;
@@ -126,6 +129,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation rel, HeapTuple tup, bool key_modified,
 					   bool *copy);
+static bool ProjectionIsNotChanged(Relation relation, HeapTuple oldtup, HeapTuple newtup);
 
 
 /*
@@ -3547,8 +3551,6 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
 	id_attrs = RelationGetIndexAttrBitmap(relation,
 										  INDEX_ATTR_BITMAP_IDENTITY_KEY);
-
-
 	block = ItemPointerGetBlockNumber(otid);
 	buffer = ReadBuffer(relation, block);
 	page = BufferGetPage(buffer);
@@ -4176,8 +4178,12 @@ l2:
 		 * changed. If the page was already full, we may have skipped checking
 		 * for index columns. If so, HOT update is possible.
 		 */
-		if (hot_attrs_checked && !bms_overlap(modified_attrs, hot_attrs))
+		if (hot_attrs_checked
+			&& !bms_overlap(modified_attrs, hot_attrs)
+			&& (!relation->rd_projection || ProjectionIsNotChanged(relation, &oldtup, newtup)))
+		{
 			use_hot_update = true;
+		}
 	}
 	else
 	{
@@ -4214,6 +4220,7 @@ l2:
 
 	if (use_hot_update)
 	{
+		elog(DEBUG1, "Use hot update");
 		/* Mark the old tuple as HOT-updated */
 		HeapTupleSetHotUpdated(&oldtup);
 		/* And mark the new tuple as heap-only */
@@ -4426,6 +4433,81 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
 }
 
 /*
+ * For functional projection index compare new and old values of indexed expression.
+ * This function is used instead of comparison of modified attributes sets for non-injective functions.
+ */
+static bool ProjectionIsNotChanged(Relation relation, HeapTuple oldtup, HeapTuple newtup)
+{
+	ListCell       *l;
+	List	       *indexoidlist = RelationGetIndexList(relation);
+	EState         *estate = CreateExecutorState();
+	ExprContext    *econtext = GetPerTupleExprContext(estate);
+	TupleTableSlot *slot = MakeSingleTupleTableSlot(RelationGetDescr(relation));
+	bool            equals = true;
+	Datum	   	    old_values[INDEX_MAX_KEYS];
+	bool		    old_isnull[INDEX_MAX_KEYS];
+	Datum	   	    new_values[INDEX_MAX_KEYS];
+	bool		    new_isnull[INDEX_MAX_KEYS];
+
+	econtext->ecxt_scantuple = slot;
+
+	foreach(l, indexoidlist)
+	{
+		Oid		    indexOid = lfirst_oid(l);
+		Relation    indexDesc = index_open(indexOid, AccessShareLock);
+		IndexInfo  *indexInfo = BuildIndexInfo(indexDesc);
+		int         i;
+
+		if (indexInfo->ii_Projection)
+		{
+			ResetExprContext(econtext);
+			ExecStoreTuple(oldtup, slot, InvalidBuffer, false);
+			FormIndexDatum(indexInfo,
+						   slot,
+						   estate,
+						   old_values,
+						   old_isnull);
+
+			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
+			FormIndexDatum(indexInfo,
+						   slot,
+						   estate,
+						   new_values,
+						   new_isnull);
+
+			for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+			{
+				if (old_isnull[i] != new_isnull[i])
+				{
+					equals = false;
+					break;
+				}
+				else if (!old_isnull[i])
+				{
+					Form_pg_attribute att = TupleDescAttr(RelationGetDescr(indexDesc), i);
+					if (!datumIsEqual(old_values[i], new_values[i], att->attbyval, att->attlen))
+					{
+						equals = false;
+						break;
+					}
+				}
+			}
+		}
+		index_close(indexDesc, AccessShareLock);
+
+		if (!equals)
+		{
+			break;
+		}
+	}
+	ExecDropSingleTupleTableSlot(slot);
+	FreeExecutorState(estate);
+
+	return equals;
+}
+
+
+/*
  * Check which columns are being updated.
  *
  * Given an updated tuple, determine (and return into the output bitmapset),
@@ -4450,7 +4532,6 @@ HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
 			modified = bms_add_member(modified,
 									  attnum - FirstLowInvalidHeapAttributeNumber);
 	}
-
 	return modified;
 }
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c7b2f03..1bf44d2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -26,6 +26,7 @@
 #include "access/amapi.h"
 #include "access/multixact.h"
 #include "access/relscan.h"
+#include "access/reloptions.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
@@ -55,6 +56,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "parser/parser.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -86,6 +88,13 @@ typedef struct
 				tups_inserted;
 } v_i_state;
 
+/* options common to all indexes */
+typedef struct
+{
+	int32		vl_len_;
+	bool        projection;
+} index_options;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1628,6 +1637,9 @@ index_drop(Oid indexId, bool concurrent)
  * ----------------------------------------------------------------
  */
 
+#define MIN_UPDATES_THRESHOLD 10
+#define HOT_RATIO_THRESHOLD   2
+
 /* ----------------
  *		BuildIndexInfo
  *			Construct an IndexInfo record for an open index
@@ -1678,6 +1690,50 @@ BuildIndexInfo(Relation index)
 		ii->ii_ExclusionStrats = NULL;
 	}
 
+	if (ii->ii_Expressions)
+	{
+		HeapTuple       tuple;
+		Datum           reloptions;
+		bool            isnull;
+		PgStat_StatTabEntry* stat = pgstat_fetch_stat_tabentry(index->rd_index->indrelid);
+
+		ii->ii_Projection = true; /* by default functional index is considered as non-injective */
+
+		if (stat->tuples_updated > MIN_UPDATES_THRESHOLD
+			&& stat->tuples_updated > stat->tuples_hot_updated*HOT_RATIO_THRESHOLD)
+		{
+			/* If percent of hot updates is small, then disable projection index function
+			 * optimization to eliminate overhead of extra index expression evaluations.
+			 */
+			ii->ii_Projection = false;
+		}
+
+		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(RelationGetRelid(index)));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for relation %u", RelationGetRelid(index));
+
+		reloptions = SysCacheGetAttr(RELOID, tuple,
+									 Anum_pg_class_reloptions, &isnull);
+		if (!isnull)
+		{
+			static const relopt_parse_elt tab[] = {
+				{"projection", RELOPT_TYPE_BOOL, offsetof(index_options, projection)}
+			};
+			int                       numoptions;
+			relopt_value *options = parseRelOptions(reloptions, false,
+													RELOPT_KIND_INDEX,
+													&numoptions);
+			if (numoptions != 0)
+			{
+				index_options optstruct;
+				fillRelOptions((void *)&optstruct, sizeof(bool), options, numoptions, false, tab, lengthof(tab));
+				ii->ii_Projection = optstruct.projection;
+				pfree(options);
+			}
+		}
+		ReleaseSysCache(tuple);
+	}
+
 	/* other info */
 	ii->ii_Unique = indexStruct->indisunique;
 	ii->ii_ReadyForInserts = IndexIsReady(indexStruct);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index b8e3780..b89fdb3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4866,6 +4866,7 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
 	List	   *newindexoidlist;
 	Oid			relpkindex;
 	Oid			relreplindex;
+	bool        projection = false;
 	ListCell   *l;
 	MemoryContext oldcxt;
 
@@ -4975,9 +4976,15 @@ restart:
 			}
 		}
 
-		/* Collect all attributes used in expressions, too */
-		pull_varattnos((Node *) indexInfo->ii_Expressions, 1, &indexattrs);
-
+		if (indexInfo->ii_Projection)
+		{
+			projection = true;
+		}
+		else
+		{
+			/* Collect all attributes used in expressions, too */
+			pull_varattnos((Node *) indexInfo->ii_Expressions, 1, &indexattrs);
+		}
 		/* Collect all attributes in the index predicate, too */
 		pull_varattnos((Node *) indexInfo->ii_Predicate, 1, &indexattrs);
 
@@ -5022,6 +5029,8 @@ restart:
 	bms_free(relation->rd_idattr);
 	relation->rd_idattr = NULL;
 
+	relation->rd_projection = projection;
+
 	/*
 	 * Now save copies of the bitmaps in the relcache entry.  We intentionally
 	 * set rd_indexattr last, because that's the one that signals validity of
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 1583cfa..91be7fa 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1653,11 +1653,11 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_CONST("(");
 	/* ALTER INDEX <foo> SET|RESET ( */
 	else if (Matches5("ALTER", "INDEX", MatchAny, "RESET", "("))
-		COMPLETE_WITH_LIST3("fillfactor", "fastupdate",
-							"gin_pending_list_limit");
+		COMPLETE_WITH_LIST4("fillfactor", "fastupdate",
+							"gin_pending_list_limit", "projection");
 	else if (Matches5("ALTER", "INDEX", MatchAny, "SET", "("))
-		COMPLETE_WITH_LIST3("fillfactor =", "fastupdate =",
-							"gin_pending_list_limit =");
+		COMPLETE_WITH_LIST4("fillfactor =", "fastupdate =",
+							"gin_pending_list_limit =", "projection = ");
 
 	/* ALTER LANGUAGE <name> */
 	else if (Matches3("ALTER", "LANGUAGE", MatchAny))
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index 5cdaa3b..6015515 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -51,6 +51,7 @@ typedef enum relopt_kind
 	RELOPT_KIND_PARTITIONED = (1 << 11),
 	/* if you add a new kind, make sure you update "last_default" too */
 	RELOPT_KIND_LAST_DEFAULT = RELOPT_KIND_PARTITIONED,
+	RELOPT_KIND_INDEX = RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
 	/* some compilers treat enums as signed ints, so we can't use 1 << 31 */
 	RELOPT_KIND_MAX = (1 << 30)
 } relopt_kind;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 90a60ab..061341f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -147,6 +147,7 @@ typedef struct IndexInfo
 	bool		ii_ReadyForInserts;
 	bool		ii_Concurrent;
 	bool		ii_BrokenHotChain;
+	bool        ii_Projection;
 	void	   *ii_AmCache;
 	MemoryContext ii_Context;
 } IndexInfo;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 4bc61e5..4e02fbe 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -125,6 +125,8 @@ typedef struct RelationData
 	List	   *rd_fkeylist;	/* list of ForeignKeyCacheInfo (see below) */
 	bool		rd_fkeyvalid;	/* true if list has been computed */
 
+	bool        rd_projection;   /* relation contains functional index with non-injective function */
+
 	MemoryContext rd_partkeycxt;	/* private memory cxt for the below */
 	struct PartitionKeyData *rd_partkey;	/* partition key, or NULL */
 	MemoryContext rd_pdcxt;		/* private context for partdesc */
diff --git a/src/test/regress/expected/func_index.out b/src/test/regress/expected/func_index.out
new file mode 100644
index 0000000..06f0de9
--- /dev/null
+++ b/src/test/regress/expected/func_index.out
@@ -0,0 +1,29 @@
+create table keyvalue(id integer primary key, info jsonb);
+create index nameindex on keyvalue((info->>'name')) with (projection=false);
+set client_min_messages=debug1;
+insert into keyvalue values (1, '{"name": "john", "data": "some data"}');
+update keyvalue set info='{"name": "john", "data": "some other data"}' where id=1;
+set client_min_messages=notice;
+drop table keyvalue;
+create table keyvalue(id integer primary key, info jsonb);
+create index nameindex on keyvalue((info->>'name')) with (projection=true);
+set client_min_messages=debug1;
+insert into keyvalue values (1, '{"name": "john", "data": "some data"}');
+update keyvalue set info='{"name": "john", "data": "some other data"}' where id=1;
+DEBUG:  Use hot update
+update keyvalue set info='{"name": "smith", "data": "some other data"}' where id=1;
+update keyvalue set info='{"name": "smith", "data": "some more data"}' where id=1;
+DEBUG:  Use hot update
+set client_min_messages=notice;
+drop table keyvalue;
+create table keyvalue(id integer primary key, info jsonb);
+create index nameindex on keyvalue((info->>'name'));
+set client_min_messages=debug1;
+insert into keyvalue values (1, '{"name": "john", "data": "some data"}');
+update keyvalue set info='{"name": "john", "data": "some other data"}' where id=1;
+DEBUG:  Use hot update
+update keyvalue set info='{"name": "smith", "data": "some other data"}' where id=1;
+update keyvalue set info='{"name": "smith", "data": "some more data"}' where id=1;
+DEBUG:  Use hot update
+set client_min_messages=notice;
+drop table keyvalue;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2fd3f2b..8f2cd16 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -79,7 +79,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index func_index update namespace prepared_xacts delete
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 76b0de3..ebff4ae 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -99,6 +99,7 @@ test: portals
 test: arrays
 test: btree_index
 test: hash_index
+test: func_index
 test: update
 test: delete
 test: namespace
diff --git a/src/test/regress/sql/func_index.sql b/src/test/regress/sql/func_index.sql
new file mode 100644
index 0000000..9540c07
--- /dev/null
+++ b/src/test/regress/sql/func_index.sql
@@ -0,0 +1,29 @@
+create table keyvalue(id integer primary key, info jsonb);
+create index nameindex on keyvalue((info->>'name')) with (projection=false);
+set client_min_messages=debug1;
+insert into keyvalue values (1, '{"name": "john", "data": "some data"}');
+update keyvalue set info='{"name": "john", "data": "some other data"}' where id=1;
+set client_min_messages=notice;
+drop table keyvalue;
+
+create table keyvalue(id integer primary key, info jsonb);
+create index nameindex on keyvalue((info->>'name')) with (projection=true);
+set client_min_messages=debug1;
+insert into keyvalue values (1, '{"name": "john", "data": "some data"}');
+update keyvalue set info='{"name": "john", "data": "some other data"}' where id=1;
+update keyvalue set info='{"name": "smith", "data": "some other data"}' where id=1;
+update keyvalue set info='{"name": "smith", "data": "some more data"}' where id=1;
+set client_min_messages=notice;
+drop table keyvalue;
+
+create table keyvalue(id integer primary key, info jsonb);
+create index nameindex on keyvalue((info->>'name'));
+set client_min_messages=debug1;
+insert into keyvalue values (1, '{"name": "john", "data": "some data"}');
+update keyvalue set info='{"name": "john", "data": "some other data"}' where id=1;
+update keyvalue set info='{"name": "smith", "data": "some other data"}' where id=1;
+update keyvalue set info='{"name": "smith", "data": "some more data"}' where id=1;
+set client_min_messages=notice;
+drop table keyvalue;
+
+
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to