On Fri, Jan 20, 2017 at 6:24 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 8:31 PM, Claudio Freire <klaussfre...@gmail.com> 
> wrote:
>> On Thu, Jan 19, 2017 at 6:33 AM, Anastasia Lubennikova
>> <a.lubennik...@postgrespro.ru> wrote:
>>> 28.12.2016 23:43, Claudio Freire:
>>>
>>> Attached v4 patches with the requested fixes.
>>>
>>>
>>> Sorry for being late, but the tests took a lot of time.
>>
>> I know. Takes me several days to run my test scripts once.
>>
>>> create table t1 as select i, md5(random()::text) from
>>> generate_series(0,400000000) as i;
>>> create index md5_idx ON  t1(md5);
>>> update t1 set md5 = md5((random() * (100 + 500))::text);
>>> vacuum;
>>>
>>> Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass,
>>> while for old version it took three passes (1GB+1GB+0.9GB).
>>> Vacuum duration results:
>>>
>>> vanilla:
>>> LOG: duration: 4359006.327 ms  statement: vacuum verbose t1;
>>> patched:
>>> LOG: duration: 3076827.378 ms  statement: vacuum verbose t1;
>>>
>>> We can see 30% vacuum speedup. I should note that this case can be
>>> considered
>>> as favorable to vanilla vacuum: the table is not that big, it has just one
>>> index
>>> and disk used is a fast fusionIO. We can expect even more gain on slower
>>> disks.
>>>
>>> Thank you again for the patch. Hope to see it in 10.0.
>>
>> Cool. Thanks for the review and the tests.
>>
>
> I encountered a bug with following scenario.
> 1. Create table and disable autovacuum on that table.
> 2. Make about 200000 dead tuples on the table.
> 3. SET maintenance_work_mem TO 1024
> 4. VACUUM
>
> @@ -729,7 +759,7 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>                          * not to reset latestRemovedXid since we want
> that value to be
>                          * valid.
>                          */
> -                       vacrelstats->num_dead_tuples = 0;
> +                       lazy_clear_dead_tuples(vacrelstats);
>                         vacrelstats->num_index_scans++;
>
>                         /* Report that we are once again scanning the heap */
>
> I think that we should do vacrelstats->dead_tuples.num_entries = 0 as
> well in lazy_clear_dead_tuples(). Once the amount of dead tuples
> reached to maintenance_work_mem, lazy_scan_heap can never finish.

That's right.

I added a test for it in the attached patch set, which uncovered
another bug in lazy_clear_dead_tuples, and took the opportunity to
rebase.

On Mon, Jan 23, 2017 at 1:06 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> I pushed this patch after rewriting it rather completely.  I added
> tracing notices to inspect the blocks it was prefetching and observed
> that the original coding was failing to prefetch the final streak of
> blocks in the table, which is an important oversight considering that it
> may very well be that those are the only blocks to read at all.
>
> I timed vacuuming a 4000-block table in my laptop (single SSD disk;
> dropped FS caches after deleting all rows in table, so that vacuum has
> to read all blocks from disk); it changes from 387ms without patch to
> 155ms with patch.  I didn't measure how much it takes to run the other
> steps in the vacuum, but it's clear that the speedup for the truncation
> phase is considerable.
>
> Ä„Thanks, Claudio!

Cool.

Though it wasn't the first time this idea has been floating around, I
can't take all the credit.


On Fri, Jan 20, 2017 at 6:25 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> FWIW, I think this patch is completely separate from the maint_work_mem
> patch and should have had its own thread and its own commitfest entry.
> I intend to get a look at the other patch next week, after pushing this
> one.

That's because it did have it, and was left in limbo due to lack of
testing on SSDs. I just had to adopt it here because otherwise tests
took way too long.
From 92c610b4ed064afba0f914efd78a03c61f4742c6 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c    | 288 ++++++++++++++++++++++++++++-------
 src/test/regress/expected/vacuum.out |   8 +
 src/test/regress/sql/vacuum.sql      |  10 ++
 3 files changed, 247 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 005440e..0b1d347 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -93,6 +93,14 @@
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
 
 /*
+ * Minimum (starting) size of the dead_tuples array segments. Will allocate
+ * space for 128MB worth of tid pointers in the first segment, further segments
+ * will grow in size exponentially. Don't make it too small or the segment list
+ * will grow bigger than the sweetspot for search efficiency on big vacuums.
+ */
+#define LAZY_MIN_TUPLES		Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData))
+
+/*
  * Before we consider skipping a page that's marked as clean in
  * visibility map, we must've seen at least this many clean pages.
  */
@@ -104,6 +112,27 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+typedef struct DeadTuplesSegment
+{
+	int			num_dead_tuples;	/* # of entries in the segment */
+	int			max_dead_tuples;	/* # of entries allocated in the segment */
+	ItemPointerData last_dead_tuple;	/* Copy of the last dead tuple (unset
+										 * until the segment is fully
+										 * populated) */
+	unsigned short padding;
+	ItemPointer dead_tuples;	/* Array of dead tuples */
+}	DeadTuplesSegment;
+
+typedef struct DeadTuplesMultiArray
+{
+	int			num_entries;	/* current # of entries */
+	int			max_entries;	/* total # of slots that can be allocated in
+								 * array */
+	int			num_segs;		/* number of dead tuple segments allocated */
+	int			last_seg;		/* last dead tuple segment with data (or 0) */
+	DeadTuplesSegment *dead_tuples;		/* array of num_segs segments */
+}	DeadTuplesMultiArray;
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -123,14 +152,14 @@ typedef struct LVRelStats
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	/* List of TIDs of tuples we intend to delete */
 	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
+	DeadTuplesMultiArray dead_tuples;
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+#define DeadTuplesCurrentSegment(lvrelstats) (&((lvrelstats)->dead_tuples.dead_tuples[ \
+	(lvrelstats)->dead_tuples.last_seg ]))
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -155,7 +184,7 @@ static void lazy_cleanup_index(Relation indrel,
 				   IndexBulkDeleteResult *stats,
 				   LVRelStats *vacrelstats);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
@@ -163,8 +192,9 @@ static BlockNumber count_nondeletable_pages(Relation onerel,
 static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
 static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
 					   ItemPointer itemptr);
+static void lazy_clear_dead_tuples(LVRelStats *vacrelstats);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int	vac_cmp_itemptr(const void *left, const void *right);
+static inline int vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 					 TransactionId *visibility_cutoff_xid, bool *all_frozen);
 
@@ -504,7 +534,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = vacrelstats->dead_tuples.max_entries;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
 	/*
@@ -683,8 +713,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if ((vacrelstats->dead_tuples.max_entries - vacrelstats->dead_tuples.num_entries) < MaxHeapTuplesPerPage &&
+			vacrelstats->dead_tuples.num_entries > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -735,7 +765,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lazy_clear_dead_tuples(vacrelstats);
 			vacrelstats->num_index_scans++;
 
 			/* Report that we are once again scanning the heap */
@@ -917,7 +947,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = vacrelstats->dead_tuples.num_entries;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -1129,10 +1159,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * instead of doing a second scan.
 		 */
 		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+			vacrelstats->dead_tuples.num_entries > 0)
 		{
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			Assert(vacrelstats->dead_tuples.last_seg == 0);		/* Should not need more
+																 * than one segment per
+																 * page */
+			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, DeadTuplesCurrentSegment(vacrelstats), &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1140,7 +1173,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lazy_clear_dead_tuples(vacrelstats);
 			vacuumed_pages++;
 		}
 
@@ -1243,7 +1276,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (vacrelstats->dead_tuples.num_entries == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1274,7 +1307,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (vacrelstats->dead_tuples.num_entries > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
@@ -1368,7 +1401,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 static void
 lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 {
+	int			tottuples;
 	int			tupindex;
+	int			segindex;
 	int			npages;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
@@ -1376,35 +1411,43 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	pg_rusage_init(&ru0);
 	npages = 0;
 
-	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+	segindex = 0;
+	tottuples = 0;
+	for (segindex = tupindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; tupindex = 0, segindex++)
 	{
-		BlockNumber tblk;
-		Buffer		buf;
-		Page		page;
-		Size		freespace;
-
-		vacuum_delay_point();
+		DeadTuplesSegment *seg = &(vacrelstats->dead_tuples.dead_tuples[segindex]);
+		int			num_dead_tuples = seg->num_dead_tuples;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
-		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
-								 vac_strategy);
-		if (!ConditionalLockBufferForCleanup(buf))
+		while (tupindex < num_dead_tuples)
 		{
-			ReleaseBuffer(buf);
-			++tupindex;
-			continue;
-		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
-									&vmbuffer);
+			BlockNumber tblk;
+			Buffer		buf;
+			Page		page;
+			Size		freespace;
 
-		/* Now that we've compacted the page, record its available space */
-		page = BufferGetPage(buf);
-		freespace = PageGetHeapFreeSpace(page);
+			vacuum_delay_point();
 
-		UnlockReleaseBuffer(buf);
-		RecordPageWithFreeSpace(onerel, tblk, freespace);
-		npages++;
+			tblk = ItemPointerGetBlockNumber(&seg->dead_tuples[tupindex]);
+			buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
+									 vac_strategy);
+			if (!ConditionalLockBufferForCleanup(buf))
+			{
+				ReleaseBuffer(buf);
+				++tupindex;
+				continue;
+			}
+			tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+										seg, &vmbuffer);
+
+			/* Now that we've compacted the page, record its available space */
+			page = BufferGetPage(buf);
+			freespace = PageGetHeapFreeSpace(page);
+
+			UnlockReleaseBuffer(buf);
+			RecordPageWithFreeSpace(onerel, tblk, freespace);
+			npages++;
+		}
+		tottuples += tupindex;
 	}
 
 	if (BufferIsValid(vmbuffer))
@@ -1416,7 +1459,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	ereport(elevel,
 			(errmsg("\"%s\": removed %d row versions in %d pages",
 					RelationGetRelationName(onerel),
-					tupindex, npages),
+					tottuples, npages),
 			 errdetail("%s.",
 					   pg_rusage_show(&ru0))));
 }
@@ -1427,34 +1470,36 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
+ * tupindex is the index in seg->dead_tuples of the first dead
  * tuple for this page.  We assume the rest follow sequentially.
  * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	ItemPointer dead_tuples = seg->dead_tuples;
+	int			num_dead_tuples = seg->num_dead_tuples;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < num_dead_tuples; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&dead_tuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&dead_tuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1588,6 +1633,7 @@ lazy_vacuum_index(Relation indrel,
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
+	DeadTuplesSegment *seg;
 
 	pg_rusage_init(&ru0);
 
@@ -1598,6 +1644,11 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
+	/* Finalize the current segment by setting its upper bound dead tuple */
+	seg = DeadTuplesCurrentSegment(vacrelstats);
+	if (seg->num_dead_tuples > 0)
+		seg->last_dead_tuple = seg->dead_tuples[seg->num_dead_tuples - 1];
+
 	/* Do bulk deletion */
 	*stats = index_bulk_delete(&ivinfo, *stats,
 							   lazy_tid_reaped, (void *) vacrelstats);
@@ -1605,7 +1656,7 @@ lazy_vacuum_index(Relation indrel,
 	ereport(elevel,
 			(errmsg("scanned index \"%s\" to remove %d row versions",
 					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+					vacrelstats->dead_tuples.num_entries),
 			 errdetail("%s.", pg_rusage_show(&ru0))));
 }
 
@@ -1973,7 +2024,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 static void
 lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 {
-	long		maxtuples;
+	long		maxtuples,
+				mintuples;
 	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
 	autovacuum_work_mem != -1 ?
 	autovacuum_work_mem : maintenance_work_mem;
@@ -1982,7 +2034,6 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 	{
 		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
 		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
 
 		/* curious coding here to ensure the multiplication can't overflow */
 		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
@@ -1996,10 +2047,18 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 		maxtuples = MaxHeapTuplesPerPage;
 	}
 
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	mintuples = Min(LAZY_MIN_TUPLES, maxtuples);
+
+	vacrelstats->dead_tuples.num_entries = 0;
+	vacrelstats->dead_tuples.max_entries = (int) maxtuples;
+	vacrelstats->dead_tuples.num_segs = 1;
+	vacrelstats->dead_tuples.last_seg = 0;
+	vacrelstats->dead_tuples.dead_tuples = (DeadTuplesSegment *)
+		palloc(sizeof(DeadTuplesSegment));
+	vacrelstats->dead_tuples.dead_tuples[0].dead_tuples = (ItemPointer)
+		palloc(mintuples * sizeof(ItemPointerData));
+	vacrelstats->dead_tuples.dead_tuples[0].max_dead_tuples = mintuples;
+	vacrelstats->dead_tuples.dead_tuples[0].num_dead_tuples = 0;
 }
 
 /*
@@ -2014,16 +2073,86 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
 	 * could if we are given a really small maintenance_work_mem. In that
 	 * case, just forget the last few tuples (we'll get 'em next time).
 	 */
-	if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+	if (vacrelstats->dead_tuples.num_entries < vacrelstats->dead_tuples.max_entries)
 	{
-		vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-		vacrelstats->num_dead_tuples++;
+		DeadTuplesSegment *seg = DeadTuplesCurrentSegment(vacrelstats);
+
+		if (seg->num_dead_tuples >= seg->max_dead_tuples)
+		{
+			/*
+			 * The segment is overflowing, so we must allocate a new segment.
+			 * We could have a preallocated segment descriptor already, in
+			 * which case we just reinitialize it, or we may need to repalloc
+			 * the vacrelstats->dead_tuples array. In that case, seg will no
+			 * longer be valid, so we must be careful about that. In any case,
+			 * we must update the last_dead_tuple copy in the overflowing
+			 * segment descriptor.
+			 */
+			Assert(seg->num_dead_tuples == seg->max_dead_tuples);
+			seg->last_dead_tuple = seg->dead_tuples[seg->num_dead_tuples - 1];
+			if (vacrelstats->dead_tuples.last_seg + 1 >= vacrelstats->dead_tuples.num_segs)
+			{
+				int			new_num_segs = vacrelstats->dead_tuples.num_segs * 2;
+
+				vacrelstats->dead_tuples.dead_tuples = (DeadTuplesSegment *) repalloc(
+							   (void *) vacrelstats->dead_tuples.dead_tuples,
+								   new_num_segs * sizeof(DeadTuplesSegment));
+				while (vacrelstats->dead_tuples.num_segs < new_num_segs)
+				{
+					/* Initialize as "unallocated" */
+					DeadTuplesSegment *nseg = &(vacrelstats->dead_tuples.dead_tuples[
+										 vacrelstats->dead_tuples.num_segs]);
+
+					nseg->num_dead_tuples = 0;
+					nseg->max_dead_tuples = 0;
+					nseg->dead_tuples = NULL;
+					vacrelstats->dead_tuples.num_segs++;
+				}
+				seg = DeadTuplesCurrentSegment(vacrelstats);
+			}
+			vacrelstats->dead_tuples.last_seg++;
+			seg = DeadTuplesCurrentSegment(vacrelstats);
+			if (seg->dead_tuples == NULL)
+			{
+				/*
+				 * If unallocated, allocate up to twice the amount of the
+				 * previous segment
+				 */
+				DeadTuplesSegment *pseg = seg - 1;
+				int			numtuples = vacrelstats->dead_tuples.max_entries - vacrelstats->dead_tuples.num_entries;
+
+				numtuples = Max(numtuples, MaxHeapTuplesPerPage);
+				numtuples = Min(numtuples, INT_MAX / 2);
+				numtuples = Min(numtuples, 2 * pseg->max_dead_tuples);
+				numtuples = Min(numtuples, MaxAllocSize / sizeof(ItemPointerData));
+				seg->dead_tuples = (ItemPointer) palloc(sizeof(ItemPointerData) * numtuples);
+				seg->max_dead_tuples = numtuples;
+			}
+			seg->num_dead_tuples = 0;
+		}
+		seg->dead_tuples[seg->num_dead_tuples] = *itemptr;
+		vacrelstats->dead_tuples.num_entries++;
+		seg->num_dead_tuples++;
 		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 vacrelstats->num_dead_tuples);
+									 vacrelstats->dead_tuples.num_entries);
 	}
 }
 
 /*
+ * lazy_clear_dead_tuples - reset all dead tuples segments
+ */
+static void
+lazy_clear_dead_tuples(LVRelStats *vacrelstats)
+{
+	int			nseg;
+
+	for (nseg = 0; nseg < vacrelstats->dead_tuples.num_segs; nseg++)
+		vacrelstats->dead_tuples.dead_tuples[nseg].num_dead_tuples = 0;
+	vacrelstats->dead_tuples.last_seg = 0;
+	vacrelstats->dead_tuples.num_entries = 0;
+}
+
+/*
  *	lazy_tid_reaped() -- is a particular tid deletable?
  *
  *		This has the right signature to be an IndexBulkDeleteCallback.
@@ -2035,10 +2164,51 @@ lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
 	LVRelStats *vacrelstats = (LVRelStats *) state;
 	ItemPointer res;
+	DeadTuplesSegment *seg,
+			   *rseg;
+	int			nseg,
+				last_seg;
+
+	if (vacrelstats->dead_tuples.num_segs == 0)
+		return false;
+
+	/*
+	 * Do a sequential search to find the first segment that could contain the
+	 * item pointer. Given the number of segments involved, binary search
+	 * would be pointless.
+	 */
+	seg = vacrelstats->dead_tuples.dead_tuples;
+	rseg = NULL;
+	last_seg = vacrelstats->dead_tuples.last_seg;
+	for (nseg = 0; nseg <= last_seg; nseg++, seg++)
+	{
+		int			rcomp = vac_cmp_itemptr((void *) itemptr, (void *) &(seg->last_dead_tuple));
+
+		if (rcomp == 0)
+		{
+			/* Found it, what are the chances! */
+			return true;
+		}
+		else if (rcomp < 0)
+		{
+			/* Found the right segment, further segments won't overlap */
+			rseg = seg;
+			break;
+		}
+	}
+	if (rseg == NULL || rseg->num_dead_tuples == 0)
+		return false;
+
+	/*
+	 * Quickly rule out by lower bound (should happen a lot) Upper bound was
+	 * already checked by segment search
+	 */
+	if (vac_cmp_itemptr((void *) itemptr, (void *) rseg->dead_tuples) < 0)
+		return false;
 
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
+								(void *) rseg->dead_tuples,
+								rseg->num_dead_tuples,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
@@ -2048,7 +2218,7 @@ lazy_tid_reaped(ItemPointer itemptr, void *state)
 /*
  * Comparator routines for use with qsort() and bsearch().
  */
-static int
+static inline int
 vac_cmp_itemptr(const void *left, const void *right)
 {
 	BlockNumber lblk,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 9b604be..7198fe4 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -81,4 +81,12 @@ SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
 DROP TABLE vaccluster;
+INSERT INTO vactst SELECT * from generate_series(1,400000);
+CREATE INDEX ix_vactst ON vactst (i);
+DELETE FROM vactst WHERE i in (SELECT i FROM vactst ORDER BY random() LIMIT 300000);
+SET maintenance_work_mem = 1024;
+VACUUM vactst;
+SET maintenance_work_mem TO DEFAULT;
+DROP INDEX ix_vactst;
+TRUNCATE TABLE vactst;
 DROP TABLE vactst;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7b819f6..dcec617 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -63,4 +63,14 @@ VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
 
 DROP TABLE vaccluster;
+
+INSERT INTO vactst SELECT * from generate_series(1,400000);
+CREATE INDEX ix_vactst ON vactst (i);
+DELETE FROM vactst WHERE i in (SELECT i FROM vactst ORDER BY random() LIMIT 300000);
+SET maintenance_work_mem = 1024;
+VACUUM vactst;
+SET maintenance_work_mem TO DEFAULT;
+DROP INDEX ix_vactst;
+TRUNCATE TABLE vactst;
+
 DROP TABLE vactst;
-- 
1.8.4.5

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to