On 23.06.2018 00:43, Peter Geoghegan wrote:
On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
<a.lepik...@postgrespro.ru> wrote:
According to your feedback, i develop second version of the patch.
In this version:
1. High-level functions index_beginscan(), index_rescan() not used. Tree
descent made by _bt_search(). _bt_binsrch() used for positioning on the
page.
2. TID list introduced in amtargetdelete() interface. Now only one tree
descent needed for deletion all tid's from the list with equal scan key
value - logical duplicates deletion problem.
3. Only one WAL record for index tuple deletion per leaf page per
amtargetdelete() call.

Cool.

What is this "race" code about?

+   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 
ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
+   LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+   page = (Page) BufferGetPage(buffer);
+   offnum = ItemPointerGetOffsetNumber(tid);
+   lp = PageGetItemId(page, offnum);
+
+   /*
+    * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
+    */
+   if (!ItemIdIsUsed(lp))
+       return NULL;

It looks wrong -- why should another process have set the item as
unused? And if we assume that that's possible at all, what's to stop a
third process from actually reusing the item pointer before we arrive
(at get_tuple_by_tid()), leaving us to find a tuple that is totally
unrelated to the original tuple to be deleted?

(Also, you're not releasing the buffer lock before you return.)

4. VACUUM can sort TID list preliminary for more quick search of duplicates.

This version of the patch prevents my own "force unique keys" patch
from working, since you're not using my proposed new
_bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
them a heap TID). It is essential that your patch be able to quickly
reach any tuple that it needs to kill. Otherwise, the worst case
performance is totally unacceptable; it will never be okay to go
through 10%+ of the index to kill a single tuple hundreds or even
thousands of times per VACUUM. It seems to me that doing this
tid_list_search() binary search is pointless -- you should instead be
relying on logical duplicates using their heap TID as a tie-breaker.
Rather than doing a binary search within tid_list_search(), you should
instead match the presorted heap TIDs at the leaf level against the
sorted in-memory TID list. You know, a bit like a merge join.

I agree with you. Binary search was developed in awaiting your patch.

I suggest that you go even further than this: I think that you should
just start distributing my patch as part of your patch series. You can
change my code if you need to.

Good. I am ready to start distributing your patch. At the beginning of the work I planned to make patch for physical TID ordering in the btree index. Your patch will make it much easier.

I also suggest using "git format patch"
with simple, short commit messages to produce patches. This makes it a
lot easier to track the version of the patch, changes over time, etc.

Ok

I understand why you'd hesitate to take ownership of my code (it has
big problems!), but the reality is that all the problems that my patch
has are also problems for your patch. One patch cannot get committed
without the other, so they are already the same project. As a bonus,
my patch will probably improve the best case performance for your
patch, since multi-deletions will now have much better locality of
access.

I still believe that the patch for physical TID ordering in btree:
1) has its own value, not only for target deletion,
2) will require only a few local changes in my code,
and this patches can be developed independently.

I prepare third version of the patches. Summary:
1. Mask DEAD tuples at a page during consistency checking (See comments for the mask_dead_tuples() function).
2. Still not using physical TID ordering.
3. Index cleanup() after each quick_vacuum_index() call was excluded.

--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company
>From b3f8ddf54c5bd68b96db8cda4f5f382b86a4568e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepik...@postgrespro.ru>
Date: Wed, 27 Jun 2018 09:57:50 +0500
Subject: [PATCH 1/2] Retail IndexTuple Deletion patch v.3

---
 contrib/bloom/blutils.c              |   1 +
 src/backend/access/brin/brin.c       |   1 +
 src/backend/access/gin/ginutil.c     |   1 +
 src/backend/access/gist/gist.c       |   1 +
 src/backend/access/hash/hash.c       |   1 +
 src/backend/access/index/indexam.c   |  15 ++++
 src/backend/access/nbtree/nbtree.c   | 170 +++++++++++++++++++++++++++++++++++
 src/backend/access/spgist/spgutils.c |   1 +
 src/include/access/amapi.h           |   6 ++
 src/include/access/genam.h           |  19 ++++
 src/include/access/nbtree.h          |   4 +
 11 files changed, 220 insertions(+)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 6b2b9e3..96f1d47 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -126,6 +126,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = blbuild;
 	amroutine->ambuildempty = blbuildempty;
 	amroutine->aminsert = blinsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e95fbbc..a0e06bd 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -103,6 +103,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = brinbuild;
 	amroutine->ambuildempty = brinbuildempty;
 	amroutine->aminsert = brininsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 0a32182..acf14e7 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -58,6 +58,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = ginbuild;
 	amroutine->ambuildempty = ginbuildempty;
 	amroutine->aminsert = gininsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8a42eff..d7a53d2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -80,6 +80,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = gistbuild;
 	amroutine->ambuildempty = gistbuildempty;
 	amroutine->aminsert = gistinsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0002df3..5fb32d6 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -76,6 +76,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = hashbuild;
 	amroutine->ambuildempty = hashbuildempty;
 	amroutine->aminsert = hashinsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 22b5cc9..9ebeb78 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -730,6 +730,21 @@ index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap)
 	return ntids;
 }
 
+IndexTargetDeleteResult *
+index_target_delete(IndexTargetDeleteInfo *info,
+					IndexTargetDeleteResult *stats,
+					Datum *values,
+					bool *isnull)
+{
+	Relation indexRelation = info->indexRelation;
+
+	RELATION_CHECKS;
+
+	CHECK_REL_PROCEDURE(amtargetdelete);
+
+	return indexRelation->rd_amroutine->amtargetdelete(info, stats, values, isnull);
+}
+
 /* ----------------
  *		index_bulk_delete - do mass deletion of index entries
  *
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 27a3032..b6f01a0 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -34,7 +34,9 @@
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/index_selfuncs.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/syscache.h"
 
 
 /* Working state needed by btvacuumpage */
@@ -127,6 +129,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = btbuild;
 	amroutine->ambuildempty = btbuildempty;
 	amroutine->aminsert = btinsert;
+	amroutine->amtargetdelete = bttargetdelete;
 	amroutine->ambulkdelete = btbulkdelete;
 	amroutine->amvacuumcleanup = btvacuumcleanup;
 	amroutine->amcanreturn = btcanreturn;
@@ -884,6 +887,173 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	return stats;
 }
 
+static int
+tid_list_search(ItemPointer tid, ItemPointer tid_list, int ntid, bool IsSorted)
+{
+	if (!IsSorted)
+	{
+		for (int i=0; i< ntid; i++)
+			if (ItemPointerEquals(tid, &(tid_list[i])))
+				return i;
+	}
+	else
+	{
+		int low = 0,
+			high = ntid,
+			mid;
+		/* Search at sorted list of TID*/
+		if ((ItemPointerCompare(tid, &tid_list[low]) >= 0) && (ItemPointerCompare(tid, &tid_list[high]) <= 0))
+		{
+			for (;;)
+			{
+				mid = (low+high)/2;
+
+				if ((mid == low) || (mid == high))
+					break;
+				if (ItemPointerCompare(tid, &tid_list[mid]) < 0)
+					if (high == mid)
+						break;
+					else
+						high = mid;
+				else if (ItemPointerCompare(tid, &tid_list[mid]) > 0)
+					if (low == mid)
+						break;
+					else
+						low = mid;
+				else
+					return mid;
+			}
+		}
+	}
+	return -1;
+}
+
+IndexTargetDeleteResult*
+bttargetdelete(IndexTargetDeleteInfo *info,
+			   IndexTargetDeleteResult *stats,
+			   Datum *values,
+			   bool *isnull)
+{
+	Relation		irel = info->indexRelation;
+	Relation		hrel = info->heapRelation;
+	ScanKey			skey;
+	int				keysCount = IndexRelationGetNumberOfKeyAttributes(irel);
+	BTStack			stack;
+	Buffer			buf;
+	Page			page;
+	BTPageOpaque	opaque;
+	OffsetNumber	offnum;
+	int				ndeletable = 0;
+	OffsetNumber	deletable[MaxOffsetNumber];
+	IndexTuple		itup;
+
+	if (stats == NULL)
+		stats = (IndexTargetDeleteResult *) palloc0(sizeof(IndexTargetDeleteResult));
+
+	itup = index_form_tuple(RelationGetDescr(irel), values, isnull);
+	skey = _bt_mkscankey(irel, itup);
+
+	/* Descend the tree and position ourselves on the target leaf page. */
+	stack = _bt_search(irel, keysCount, skey, false, &buf, BT_READ, NULL);
+	_bt_freestack(stack);
+
+	/* To prepare tuple entries search across index pages */
+	Assert(BufferIsValid(buf));
+	offnum = _bt_binsrch(irel, buf, keysCount, skey, false);
+	page = BufferGetPage(buf);
+	_bt_checkpage(irel, buf);
+	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	for (;;)
+	{
+		int32		cmpval;
+		ItemId		itemid;
+		IndexTuple	itup;
+		int			pos;
+
+		/* Switch to the next page */
+		if (P_IGNORE(opaque) || (offnum > PageGetMaxOffsetNumber(page)))
+		{
+			/*
+			 * Before unlocking index page we need to delete
+			 * all currently found tuples
+			 */
+			if (ndeletable > 0)
+			{
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				LockBufferForCleanup(buf);
+
+				_bt_delitems_delete(irel, buf, deletable, ndeletable, hrel);
+
+				stats->tuples_removed += ndeletable;
+				ndeletable = 0;
+			}
+
+			/*
+			 * Check for end-of-index
+			 */
+			if (P_RIGHTMOST(opaque))
+				/* it is rightmost leaf */
+				break;
+
+			/*
+			 * Switch to the next index page
+			 */
+			buf = _bt_relandgetbuf(irel, buf, opaque->btpo_next, BT_READ);
+			page = BufferGetPage(buf);
+			_bt_checkpage(irel, buf);
+			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+			offnum = P_FIRSTDATAKEY(opaque);
+			continue;
+		}
+
+		/*
+		 * This index entry satisfied to the scan key?
+		 */
+		cmpval = _bt_compare(irel, keysCount, skey, page, offnum);
+
+		if (cmpval != 0)
+			/* End of index entries, satisfied to the scan key */
+			break;
+
+		/*
+		 * To load index tuple and look for matches in the TID list of
+		 * dead heap tuples
+		 */
+		itemid = PageGetItemId(page, offnum);
+		itup = (IndexTuple) PageGetItem(page, itemid);
+		pos = tid_list_search(&(itup->t_tid), info->dead_tuples, info->num_dead_tuples, false);
+
+		if ((pos >= 0) && (!info->found_dead_tuples[pos]))
+		{
+			/* index entry for TID of dead tuple is found */
+			deletable[ndeletable++] = offnum;
+			info->found_dead_tuples[pos] = true;
+		}
+
+		offnum = OffsetNumberNext(offnum);
+	}
+
+	/*
+	 * Delete all found index tuples
+	 */
+	if (ndeletable > 0)
+	{
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		LockBufferForCleanup(buf);
+
+		_bt_delitems_delete(irel, buf, deletable, ndeletable, hrel);
+
+		stats->tuples_removed += ndeletable;
+	}
+
+	/* Release scan key, unpin and unlock buffer */
+	_bt_freeskey(skey);
+	_bt_relbuf(irel, buf);
+
+	return stats;
+}
+
 /*
  * Post-VACUUM cleanup.
  *
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 4a9b5da..3257281 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -56,6 +56,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = spgbuild;
 	amroutine->ambuildempty = spgbuildempty;
 	amroutine->aminsert = spginsert;
+	amroutine->amtargetdelete = NULL;
 	amroutine->ambulkdelete = spgbulkdelete;
 	amroutine->amvacuumcleanup = spgvacuumcleanup;
 	amroutine->amcanreturn = spgcanreturn;
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 14526a6..497b54e 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -76,6 +76,11 @@ typedef bool (*aminsert_function) (Relation indexRelation,
 								   IndexUniqueCheck checkUnique,
 								   struct IndexInfo *indexInfo);
 
+/* target delete */
+typedef IndexTargetDeleteResult *(*amtargetdelete_function) (IndexTargetDeleteInfo *info,
+															 IndexTargetDeleteResult *stats,
+															 Datum *values,
+															 bool *isnull);
 /* bulk delete */
 typedef IndexBulkDeleteResult *(*ambulkdelete_function) (IndexVacuumInfo *info,
 														 IndexBulkDeleteResult *stats,
@@ -207,6 +212,7 @@ typedef struct IndexAmRoutine
 	ambuild_function ambuild;
 	ambuildempty_function ambuildempty;
 	aminsert_function aminsert;
+	amtargetdelete_function amtargetdelete;
 	ambulkdelete_function ambulkdelete;
 	amvacuumcleanup_function amvacuumcleanup;
 	amcanreturn_function amcanreturn;	/* can be NULL */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 24c720b..90b7e6f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -51,6 +51,16 @@ typedef struct IndexVacuumInfo
 	BufferAccessStrategy strategy;	/* access strategy for reads */
 } IndexVacuumInfo;
 
+typedef struct IndexTargetDeleteInfo
+{
+	Relation	heapRelation;
+	Relation	indexRelation;			/* the index being vacuumed */
+	int			num_dead_tuples;
+	bool		isSorted;				/* dead tuples list sorted */
+	ItemPointer	dead_tuples;
+	bool*		found_dead_tuples;
+} IndexTargetDeleteInfo;
+
 /*
  * Struct for statistics returned by ambulkdelete and amvacuumcleanup
  *
@@ -79,6 +89,11 @@ typedef struct IndexBulkDeleteResult
 	BlockNumber pages_free;		/* # pages available for reuse */
 } IndexBulkDeleteResult;
 
+typedef struct IndexTargetDeleteResult
+{
+	int		tuples_removed; /* # removed during vacuum operation */
+} IndexTargetDeleteResult;
+
 /* Typedef for callback function to determine if a tuple is bulk-deletable */
 typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
 
@@ -163,6 +178,10 @@ extern HeapTuple index_fetch_heap(IndexScanDesc scan);
 extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction);
 extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
 
+extern IndexTargetDeleteResult *index_target_delete(IndexTargetDeleteInfo *info,
+													IndexTargetDeleteResult *stats,
+													Datum *values,
+													bool *isnull);
 extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
 				  IndexBulkDeleteResult *stats,
 				  IndexBulkDeleteCallback callback,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 04ecb4c..3b9461c 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -508,6 +508,10 @@ extern void btparallelrescan(IndexScanDesc scan);
 extern void btendscan(IndexScanDesc scan);
 extern void btmarkpos(IndexScanDesc scan);
 extern void btrestrpos(IndexScanDesc scan);
+extern IndexTargetDeleteResult *bttargetdelete(IndexTargetDeleteInfo *info,
+		IndexTargetDeleteResult *stats,
+		   Datum *values,
+		   bool *isnull);
 extern IndexBulkDeleteResult *btbulkdelete(IndexVacuumInfo *info,
 			 IndexBulkDeleteResult *stats,
 			 IndexBulkDeleteCallback callback,
-- 
2.7.4

>From b33cd81ba0e92e02077d133009ac2ac9addc0983 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepik...@postgrespro.ru>
Date: Wed, 27 Jun 2018 10:14:19 +0500
Subject: [PATCH 2/2] Quick Vacuum Strategy patch v.3

---
 src/backend/access/heap/heapam.c    |  31 +++++++
 src/backend/access/heap/pruneheap.c |  40 ++++++++-
 src/backend/commands/vacuumlazy.c   | 162 +++++++++++++++++++++++++++++++++++-
 src/backend/utils/misc/guc.c        |  10 ++-
 src/include/access/heapam.h         |   1 +
 5 files changed, 235 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 72395a5..bf964f8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -9393,6 +9393,36 @@ heap_sync(Relation rel)
 }
 
 /*
+ * Mask DEAD tuples at a HEAP page
+ *
+ * We introduce this function for wal_consistency_checking satisfaction at
+ * Hot Standby node.
+ *
+ * DEAD tuple on a master node has t_cid and t_infomask, which can not be
+ * obtained by WAL records applying on a Hot Standby node.
+ */
+static void
+mask_dead_tuples(Page page)
+{
+	OffsetNumber	offnum;
+
+	for (offnum = FirstOffsetNumber; offnum <= PageGetMaxOffsetNumber(page); offnum = OffsetNumberNext(offnum))
+	{
+		ItemId	lp = PageGetItemId(page, offnum);
+		char*	htup_begin;
+
+		if (!ItemIdIsDead(lp))
+			continue;
+
+		if (!ItemIdHasStorage(lp))
+			continue;
+
+		htup_begin = (char *) PageGetItem(page, lp);
+		memset(htup_begin, MASK_MARKER, ItemIdGetLength(lp));
+	}
+}
+
+/*
  * Mask a heap page before performing consistency checks on it.
  */
 void
@@ -9405,6 +9435,7 @@ heap_mask(char *pagedata, BlockNumber blkno)
 
 	mask_page_hint_bits(page);
 	mask_unused_space(page);
+	mask_dead_tuples(page);
 
 	for (off = 1; off <= PageGetMaxOffsetNumber(page); off++)
 	{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c2f5343..ebbafe3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,6 +43,9 @@ typedef struct
 	bool		marked[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
+/* Parameter for target deletion strategy in lazy vacuum */
+double target_index_deletion_factor = 0.01;
+
 /* Local functions */
 static int heap_prune_chain(Relation relation, Buffer buffer,
 				 OffsetNumber rootoffnum,
@@ -405,7 +408,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 			if (HeapTupleSatisfiesVacuum(&tup, OldestXmin, buffer)
 				== HEAPTUPLE_DEAD && !HeapTupleHeaderIsHotUpdated(htup))
 			{
-				heap_prune_record_unused(prstate, rootoffnum);
+				heap_prune_record_dead(prstate, rootoffnum);
 				HeapTupleHeaderAdvanceLatestRemovedXid(htup,
 													   &prstate->latestRemovedXid);
 				ndeleted++;
@@ -580,8 +583,10 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		 */
 		for (i = 1; (i < nchain) && (chainitems[i - 1] != latestdead); i++)
 		{
-			heap_prune_record_unused(prstate, chainitems[i]);
 			ndeleted++;
+			if (chainitems[i] == latestdead)
+				continue;
+			heap_prune_record_unused(prstate, chainitems[i]);
 		}
 
 		/*
@@ -598,9 +603,28 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
+		{
+			if (rootoffnum != latestdead)
+			{
+				if (ItemIdIsNormal(rootlp))
+					heap_prune_record_unused(prstate, latestdead);
+				else
+				{
+					/*
+					 * We allow overlapping of redirected and dead items
+					 */
+					heap_prune_record_redirect(prstate, rootoffnum, latestdead);
+					heap_prune_record_dead(prstate, latestdead);
+				}
+			}
 			heap_prune_record_dead(prstate, rootoffnum);
+		}
 		else
+		{
+			if (rootoffnum != latestdead)
+				heap_prune_record_unused(prstate, latestdead);
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+		}
 	}
 	else if (nchain < 2 && ItemIdIsRedirected(rootlp))
 	{
@@ -653,7 +677,12 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 	Assert(prstate->ndead < MaxHeapTuplesPerPage);
 	prstate->nowdead[prstate->ndead] = offnum;
 	prstate->ndead++;
-	Assert(!prstate->marked[offnum]);
+
+	/*
+	 * We suppress checking prstate->marked[offnum]. It is not the best idea,
+	 * but this is most simplistic way to enable Dead Redirecting by
+	 * overlapping Dead and Redirected states.
+	 */
 	prstate->marked[offnum] = true;
 }
 
@@ -706,7 +735,10 @@ heap_page_prune_execute(Buffer buffer,
 		OffsetNumber off = *offnum++;
 		ItemId		lp = PageGetItemId(page, off);
 
-		ItemIdSetDead(lp);
+		if (target_index_deletion_factor > 0)
+			ItemIdMarkDead(lp);
+		else
+			ItemIdSetDead(lp);
 	}
 
 	/* Update all now-unused line pointers */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 5649a70..15f7e32 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -139,7 +139,6 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
-
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
 
@@ -156,6 +155,8 @@ static void lazy_scan_heap(Relation onerel, int options,
 			   bool aggressive);
 static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
+static void quick_vacuum_index(Relation irel, Relation hrel,
+					LVRelStats *vacrelstats);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
 				  LVRelStats *vacrelstats);
@@ -734,10 +735,17 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			/* Remove index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
+			{
+				bool use_quick_strategy = (vacrelstats->num_dead_tuples/vacrelstats->old_live_tuples < target_index_deletion_factor);
+
+				if (use_quick_strategy)
+					quick_vacuum_index(Irel[i], onerel, vacrelstats);
+				else
+					lazy_vacuum_index(Irel[i],
 								  &indstats[i],
 								  vacrelstats);
 
+			}
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
 			 * the number of index scans here; note that by using
@@ -1379,10 +1387,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
+		{
+			bool use_quick_strategy = (vacrelstats->num_dead_tuples/vacrelstats->old_live_tuples < target_index_deletion_factor);
+
+			if (use_quick_strategy)
+				quick_vacuum_index(Irel[i], onerel, vacrelstats);
+			else
+				lazy_vacuum_index(Irel[i],
 							  &indstats[i],
 							  vacrelstats);
-
+		}
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
 		hvp_val[1] = vacrelstats->num_index_scans + 1;
@@ -1671,6 +1685,152 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
 	return false;
 }
 
+/*
+ * Get tuple from heap for a scan key building.
+ */
+static HeapTuple
+get_tuple_by_tid(Relation rel, ItemPointer tid)
+{
+	Buffer			buffer;
+	Page			page;
+	OffsetNumber	offnum;
+	ItemId			lp;
+	HeapTuple		tuple;
+
+	buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
+	LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	page = (Page) BufferGetPage(buffer);
+	offnum = ItemPointerGetOffsetNumber(tid);
+	lp = PageGetItemId(page, offnum);
+
+	/*
+	 * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
+	 */
+	if (!ItemIdIsUsed(lp))
+	{
+		UnlockReleaseBuffer(buffer);
+		return NULL;
+	}
+	/* Walk along the chain */
+	while (!ItemIdHasStorage(lp))
+	{
+		offnum = ItemIdGetRedirect(lp);
+		lp = PageGetItemId(page, offnum);
+		Assert(ItemIdIsUsed(lp));
+	}
+
+	/* Form a tuple */
+	tuple = palloc(sizeof(HeapTupleData));
+	ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum);
+	tuple->t_tableOid = RelationGetRelid(rel);
+	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
+	tuple->t_len = ItemIdGetLength(lp);
+	UnlockReleaseBuffer(buffer);
+	return tuple;
+}
+
+#include "access/nbtree.h"
+#include "catalog/index.h"
+#include "executor/executor.h"
+
+static int tid_comparator(const void* a, const void* b)
+{
+	return ItemPointerCompare((ItemPointer)a, (ItemPointer)b);
+}
+
+/*
+ *	quick_vacuum_index() -- quick vacuum one index relation.
+ *
+ *		Delete all the index entries pointing to tuples listed in
+ *		vacrelstats->dead_tuples.
+ */
+static void
+quick_vacuum_index(Relation irel, Relation hrel,
+				   LVRelStats *vacrelstats)
+{
+	IndexTargetDeleteResult	stats;
+	IndexTargetDeleteInfo	ivinfo;
+
+	if (irel->rd_amroutine->amtargetdelete != NULL)
+	{
+		int				tnum;
+		bool*			found = palloc0(vacrelstats->num_dead_tuples*sizeof(bool));
+		IndexInfo* 		indexInfo = BuildIndexInfo(irel);
+		EState*			estate = CreateExecutorState();
+		ExprContext*	econtext = GetPerTupleExprContext(estate);
+		ExprState*		predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+
+		ivinfo.indexRelation = irel;
+		ivinfo.heapRelation = hrel;
+		qsort((void *)vacrelstats->dead_tuples, vacrelstats->num_dead_tuples, sizeof(ItemPointerData), tid_comparator);
+		ivinfo.isSorted = true;
+
+		/* Get tuple from heap */
+		for (tnum = 0; tnum < vacrelstats->num_dead_tuples; tnum++)
+		{
+			HeapTuple		tuple;
+			TupleTableSlot*	slot;
+			Datum			values[INDEX_MAX_KEYS];
+			bool			isnull[INDEX_MAX_KEYS];
+
+			/* Index entry for the TID was deleted early */
+			if (found[tnum])
+				continue;
+
+			/* Get a tuple from heap */
+			if ((tuple = get_tuple_by_tid(hrel, &(vacrelstats->dead_tuples[tnum]))) == NULL)
+			{
+				/*
+				 * Tuple has 'not used' status.
+				 */
+				found[tnum] = true;
+				continue;
+			}
+
+			/*
+			 * Form values[] and isnull[] arrays from for index tuple
+			 * by heap tuple
+			 */
+			slot = MakeSingleTupleTableSlot(RelationGetDescr(hrel));
+			econtext->ecxt_scantuple = slot;
+
+			ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+
+			/*
+			 * In a partial index, ignore tuples that don't satisfy the
+			 * predicate.
+			 */
+			if ((predicate != NULL) && (!ExecQual(predicate, econtext)))
+			{
+				found[tnum] = true;
+				continue;
+			}
+
+			FormIndexDatum(indexInfo, slot, estate, values, isnull);
+
+			ExecDropSingleTupleTableSlot(slot);
+
+			/*
+			 * Make attempt to delete some index entries by one tree descent.
+			 * We use only a part of TID list, which contains not found TID's.
+			 */
+			ivinfo.dead_tuples = &(vacrelstats->dead_tuples[tnum]);
+			ivinfo.num_dead_tuples = vacrelstats->num_dead_tuples-tnum;
+			ivinfo.found_dead_tuples = found+tnum;
+			index_target_delete(&ivinfo, &stats, values, isnull);
+		}
+
+		pfree(found);
+		FreeExecutorState(estate);
+	}
+	else
+	{
+		IndexBulkDeleteResult **stats;
+
+		lazy_vacuum_index(irel, stats, vacrelstats);
+	}
+}
 
 /*
  *	lazy_vacuum_index() -- vacuum one index relation.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index fa3c8a7..3f6c83b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3236,7 +3236,15 @@ static struct config_real ConfigureNamesReal[] =
 		0.1, 0.0, 100.0,
 		NULL, NULL, NULL
 	},
-
+	{
+		{"target_index_deletion_factor", PGC_SIGHUP, AUTOVACUUM,
+			gettext_noop("Maximum number of vacuumed tuples as a fraction of reltuples where we can use target index vacuum strategy."),
+			NULL
+		},
+		&target_index_deletion_factor,
+		0.01, 0.0, 1.0,
+		NULL, NULL, NULL
+	},
 	{
 		{"checkpoint_completion_target", PGC_SIGHUP, WAL_CHECKPOINTS,
 			gettext_noop("Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval."),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ca5cad7..c5a7bc8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -100,6 +100,7 @@ extern Relation heap_openrv_extended(const RangeVar *relation,
 typedef struct HeapScanDescData *HeapScanDesc;
 typedef struct ParallelHeapScanDescData *ParallelHeapScanDesc;
 
+extern double target_index_deletion_factor;
 /*
  * HeapScanIsValid
  *		True iff the heap scan is valid.
-- 
2.7.4

Reply via email to