Simon Riggs wrote:
On Tue, 2009-01-20 at 11:03 +0200, Heikki Linnakangas wrote:
- Are there some conditions where whole-table-scanning vacuum is
more
  effective than vacuums using visibility map? If so, we should
switch
  to full-scan *automatically*, without relying on user
configurations.

Hmm, the only downside I can see is that skipping a page here and
there could defeat the OS read-ahead. Perhaps we should call posix_fadvise(SEQUENTIAL) to compensate. Or, we could modify the logic
to only skip pages when there's at least N consecutive pages that can
be skipped.

I would rather we didn't skip any pages at all unless the gains are
significant. Skipping the odd page makes no difference from a
performance perspective but may have a robustness impact.

"Significant gains" should take into account the size of both heap and
indexes, and recognise that we still scan whole indexes in either case.

That sounds pretty complex, approaching what the planner does. I'd rather keep it simple.

Attached is a simple patch to only start skipping pages after 20 consecutive pages marked as visible in the visibility map. This doesn't do any "look-ahead", so it will always scan the first 20 pages of a table before it starts to skip pages, and whenever there's even one page that needs vacuuming, the next 19 pages will also be vacuumed.

We could adjust that figure 20 according to table size. Or by seq_page_cost/random_page_cost. But I'm leaning towards a simple hard-coded value for now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** src/backend/commands/vacuumlazy.c
--- src/backend/commands/vacuumlazy.c
***************
*** 271,276 **** lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
--- 271,277 ----
  	int			i;
  	PGRUsage	ru0;
  	Buffer		vmbuffer = InvalidBuffer;
+ 	BlockNumber	all_visible_streak;
  
  	pg_rusage_init(&ru0);
  
***************
*** 292,297 **** lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
--- 293,299 ----
  
  	lazy_space_alloc(vacrelstats, nblocks);
  
+ 	all_visible_streak = 0;
  	for (blkno = 0; blkno < nblocks; blkno++)
  	{
  		Buffer		buf;
***************
*** 309,315 **** lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  
  		/*
  		 * Skip pages that don't require vacuuming according to the
! 		 * visibility map.
  		 */
  		if (!scan_all)
  		{
--- 311,324 ----
  
  		/*
  		 * Skip pages that don't require vacuuming according to the
! 		 * visibility map. But only if we've seen a streak of at least
! 		 * 20 pages marked as clean. Since we're reading sequentially,
! 		 * the OS should be doing readahead for us and there's no gain
! 		 * in skipping a page now and then. You need a longer run of
! 		 * consecutive skipped pages before it's worthwhile. Also,
! 		 * skipping even a single page means that we can't update
! 		 * relfrozenxid or reltuples, so we only want to do it if
! 		 * there's a good chance to skip a goodly number of pages.
  		 */
  		if (!scan_all)
  		{
***************
*** 317,325 **** lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  				visibilitymap_test(onerel, blkno, &vmbuffer);
  			if (all_visible_according_to_vm)
  			{
! 				vacrelstats->scanned_all = false;
! 				continue;
  			}
  		}
  
  		vacuum_delay_point();
--- 326,340 ----
  				visibilitymap_test(onerel, blkno, &vmbuffer);
  			if (all_visible_according_to_vm)
  			{
! 				all_visible_streak++;
! 				if (all_visible_streak >= 20)
! 				{
! 					vacrelstats->scanned_all = false;
! 					continue;
! 				}
  			}
+ 			else
+ 				all_visible_streak = 0;
  		}
  
  		vacuum_delay_point();
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to