[HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

Abhijit Menon-Sen Wed, 02 Apr 2014 14:42:51 -0700

This is a follow-up to the thread at
http://www.postgresql.org/message-id/4eb5fa1b.1090...@2ndquadrant.com


A quick summary: that thread proposed adding a relation_free_space()
function to the pageinspect extension. Various review comments were
received, among which was the suggestion that the code belonged in
pg_stattuple as a faster way to calculate free_percent.

===

I've attached an extension that produces largely pgstattuple-compatible
numbers for a table without doing a full-table scan.

It scans through the table, skipping blocks that have their visibility
map bit set. For such pages, it gets the free space from the free space
map, and assumes that all remaining space on the page is taken by live
tuples. It scans other pages tuple-by-tuple and counts live and dead
tuples and free space.

Here's a comparison of fastbloat and pgstattuple output on a 50-million
row table with some holes created with a single big DELETE statement:

ams=# select * from fastbloat('x');
 table_len  | scanned_percent | approx_tuple_count | approx_tuple_len | 
approx_tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | 
free_space | free_percent 
------------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |              17 |           41318301 |       5483815648 |         
       81.67 |          8681708 |     1111258624 |              16.55 |   
80972912 |         1.21
(1 row)

Time: 639.455 ms

ams=# select * from pgstattuple('x');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | 
dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |    41318292 | 5288741376 |         78.76 |          8681708 |     
1111258624 |              16.55 |   91810372 |         1.37
(1 row)

Time: 15610.651 ms

In the above, the table_len is nblocks*BLCKSZ, and the dead_tuple_count,
dead_tuple_len, dead_tuple_percent, free_space, and free_percent are all
exact. scanned_percent shows the percentage of pages that were scanned
tuple-by-tuple (the others having been skipped based on the VM bit).
The live tuple count, size, and percentage are all estimates.

The approx_tuple_count is calculated using vac_estimate_reltuples based
on the pages/tuples that were scanned. The approx_tuple_len is the exact
size of the live tuples on scanned pages, plus the approximate size from
skipped pages (BLCKSZ-GetRecordedFreeSpace()). This is an overestimate,
because it's counting the line pointer array as live tuple space.

Even in the worst case, when every page has dead tuples, fastbloat is
marginally faster than pgstattuple. The same table as the first example,
but with every even-numbered row deleted:

ams=# select * from fastbloat('x');
 table_len  | scanned_percent | approx_tuple_count | approx_tuple_len | 
approx_tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | 
free_space | free_percent 
------------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |             100 |           20659150 |       2644371200 |         
       39.38 |         20659142 |     2644370176 |              39.38 | 
1203068996 |        17.92
(1 row)

Time: 8924.511 ms

ams=# select * from pgstattuple('x');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | 
dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |    20659150 | 2644371200 |         39.38 |         20659142 |     
2644370176 |              39.38 | 1203068996 |        17.92
(1 row)

Time: 13338.712 ms

Since the code depends on the visibility map to determine which pages to
skip, it does not support indexes (which have no visibility map).

(Just drop the attached files into contrib/fastbloat, and "make install"
should just work. Then just "create extension fastbloat".)

Questions and suggestions welcome.

-- Abhijit

/*
 * contrib/fastbloat/fastbloat.c
 *
 * Abhijit Menon-Sen <a...@2ndquadrant.com>
 * Portions Copyright (c) 2001,2002	Tatsuo Ishii (from pg_stattuple)
 *
 * Permission to use, copy, modify, and distribute this software and
 * its documentation for any purpose, without fee, and without a
 * written agreement is hereby granted, provided that the above
 * copyright notice and this paragraph and the following two
 * paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
 * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
 * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
 * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
 * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
 * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 */

#include "postgres.h"

#include "access/visibilitymap.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/multixact.h"
#include "access/htup_details.h"
#include "catalog/namespace.h"
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/freespace.h"
#include "storage/procarray.h"
#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/tqual.h"
#include "commands/vacuum.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(fastbloat);
PG_FUNCTION_INFO_V1(fastbloatbyid);

extern Datum fastbloat(PG_FUNCTION_ARGS);
extern Datum fastbloatbyid(PG_FUNCTION_ARGS);

/*
 * tuple_percent, dead_tuple_percent and free_percent are computable,
 * so not defined here.
 */
typedef struct fastbloat_output_type
{
	uint64		table_len;
	uint64		tuple_count;
	uint64		tuple_len;
	uint64		dead_tuple_count;
	uint64		dead_tuple_len;
	uint64		free_space;
	uint64		total_pages;
	uint64		scanned_pages;
} fastbloat_output_type;

static Datum build_output_type(fastbloat_output_type *stat,
							   FunctionCallInfo fcinfo);
static Datum fbstat_relation(Relation rel, FunctionCallInfo fcinfo);
static Datum fbstat_heap(Relation rel, FunctionCallInfo fcinfo);
static HTSV_Result
HeapTupleSatisfiesVacuumNoHint(HeapTuple htup, TransactionId OldestXmin);

/*
 * build a fastbloat_output_type tuple
 */
static Datum
build_output_type(fastbloat_output_type *stat, FunctionCallInfo fcinfo)
{
#define NCOLUMNS	10
#define NCHARS		32

	HeapTuple	tuple;
	char	   *values[NCOLUMNS];
	char		values_buf[NCOLUMNS][NCHARS];
	int			i;
	double		tuple_percent;
	double		dead_tuple_percent;
	double		free_percent;	/* free/reusable space in % */
	double		scanned_percent;
	TupleDesc	tupdesc;
	AttInMetadata *attinmeta;

	/* Build a tuple descriptor for our result type */
	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
		elog(ERROR, "return type must be a row type");

	/*
	 * Generate attribute metadata needed later to produce tuples from raw C
	 * strings
	 */
	attinmeta = TupleDescGetAttInMetadata(tupdesc);

	if (stat->table_len == 0)
	{
		tuple_percent = 0.0;
		dead_tuple_percent = 0.0;
		free_percent = 0.0;
	}
	else
	{
		tuple_percent = 100.0 * stat->tuple_len / stat->table_len;
		dead_tuple_percent = 100.0 * stat->dead_tuple_len / stat->table_len;
		free_percent = 100.0 * stat->free_space / stat->table_len;
	}

	scanned_percent = 0.0;
	if (stat->total_pages != 0)
	{
		scanned_percent = 100 * stat->scanned_pages / stat->total_pages;
	}

	for (i = 0; i < NCOLUMNS; i++)
		values[i] = values_buf[i];
	i = 0;
	snprintf(values[i++], NCHARS, INT64_FORMAT, stat->table_len);
	snprintf(values[i++], NCHARS, "%.2f", scanned_percent);
	snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_count);
	snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_len);
	snprintf(values[i++], NCHARS, "%.2f", tuple_percent);
	snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_count);
	snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_len);
	snprintf(values[i++], NCHARS, "%.2f", dead_tuple_percent);
	snprintf(values[i++], NCHARS, INT64_FORMAT, stat->free_space);
	snprintf(values[i++], NCHARS, "%.2f", free_percent);

	tuple = BuildTupleFromCStrings(attinmeta, values);

	return HeapTupleGetDatum(tuple);
}

/* Returns a tuple with live/dead tuple statistics for the named table.
 */

Datum
fastbloat(PG_FUNCTION_ARGS)
{
	text	   *relname = PG_GETARG_TEXT_P(0);
	RangeVar   *relrv;
	Relation	rel;

	if (!superuser())
		ereport(ERROR,
				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
				 (errmsg("must be superuser to use fastbloat functions"))));

	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
	rel = relation_openrv(relrv, AccessShareLock);

	PG_RETURN_DATUM(fbstat_relation(rel, fcinfo));
}

/* As above, but takes a reloid instead of a relation name.
 */

Datum
fastbloatbyid(PG_FUNCTION_ARGS)
{
	Oid			relid = PG_GETARG_OID(0);
	Relation	rel;

	if (!superuser())
		ereport(ERROR,
				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
				 (errmsg("must be superuser to use fastbloat functions"))));

	rel = relation_open(relid, AccessShareLock);

	PG_RETURN_DATUM(fbstat_relation(rel, fcinfo));
}

/*
 * A helper function to reject unsupported relation types. We depend on
 * the visibility map to decide which pages we can skip, so we can't
 * support indexes, for example, which don't have a VM.
 */

static Datum
fbstat_relation(Relation rel, FunctionCallInfo fcinfo)
{
	const char *err;

	/*
	 * Reject attempts to read non-local temporary relations; we would be
	 * likely to get wrong data since we have no visibility into the owning
	 * session's local buffers.
	 */
	if (RELATION_IS_OTHER_TEMP(rel))
		ereport(ERROR,
				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
				 errmsg("cannot access temporary tables of other sessions")));

	switch (rel->rd_rel->relkind)
	{
		case RELKIND_RELATION:
		case RELKIND_MATVIEW:
			return fbstat_heap(rel, fcinfo);
		case RELKIND_TOASTVALUE:
			err = "toast value";
			break;
		case RELKIND_SEQUENCE:
			err = "sequence";
			break;
		case RELKIND_INDEX:
			err = "index";
			break;
		case RELKIND_VIEW:
			err = "view";
			break;
		case RELKIND_COMPOSITE_TYPE:
			err = "composite type";
			break;
		case RELKIND_FOREIGN_TABLE:
			err = "foreign table";
			break;
		default:
			err = "unknown";
			break;
	}

	ereport(ERROR,
			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
			 errmsg("\"%s\" (%s) is not supported",
					RelationGetRelationName(rel), err)));
	return 0;
}

/*
 * This function takes an already open relation and scans its pages,
 * skipping those that have the corresponding visibility map bit set.
 * For pages we skip, we find the free space from the free space map
 * and approximate tuple_len on that basis. For the others, we count
 * the exact number of dead tuples etc.
 *
 * This scan is loosely based on vacuumlazy.c:lazy_scan_heap(), but
 * we do not try to avoid skipping single pages.
 */

static Datum
fbstat_heap(Relation rel, FunctionCallInfo fcinfo)
{
	BlockNumber scanned,
				nblocks,
				blkno;
	Buffer		vmbuffer = InvalidBuffer;
	HeapTupleData tuple;
	fastbloat_output_type stat = {0};
	BufferAccessStrategy bstrategy;
	TransactionId OldestXmin;

	OldestXmin = GetOldestXmin(rel, true);
	bstrategy = GetAccessStrategy(BAS_BULKREAD);

	scanned = 0;
	nblocks = RelationGetNumberOfBlocks(rel);

	for (blkno = 0; blkno < nblocks; blkno++)
	{
		Buffer		buf;
		Page		page;
		OffsetNumber offnum,
					maxoff;
		Size		freespace;

		/*
		 * If the page has only visible tuples, then we can find out the
		 * free space from the FSM and move on. The remainder of space
		 * on the page (including that used by line pointers)
		 */

		if (visibilitymap_test(rel, blkno, &vmbuffer))
		{
			freespace = GetRecordedFreeSpace(rel, blkno);
			stat.tuple_len += BLCKSZ - freespace;
			stat.free_space += freespace;
			continue;
		}

		buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno,
								 RBM_NORMAL, bstrategy);

		page = BufferGetPage(buf);

		if (PageIsNew(page))
		{
			ReleaseBuffer(buf);
			continue;
		}

		scanned++;

		stat.free_space += PageGetHeapFreeSpace(page);

		/*
		 * Look at each tuple on the page and decide whether it's live
		 * or dead, then count it and its size. Unlike lazy_scan_heap,
		 * we can afford to ignore problems and special cases. We also
		 * do not need to hold a content lock on the buffer because of
		 * our non-hint-bit-setting copy of HeapTupleSatisfiesVacuum.
		 */

		maxoff = PageGetMaxOffsetNumber(page);

		for (offnum = FirstOffsetNumber;
			 offnum <= maxoff;
			 offnum = OffsetNumberNext(offnum))
		{
			ItemId		itemid;

			itemid = PageGetItemId(page, offnum);

			if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid) ||
				ItemIdIsDead(itemid))
			{
				continue;
			}

			Assert(ItemIdIsNormal(itemid));

			ItemPointerSet(&(tuple.t_self), blkno, offnum);

			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
			tuple.t_len = ItemIdGetLength(itemid);
			tuple.t_tableOid = RelationGetRelid(rel);

			switch (HeapTupleSatisfiesVacuumNoHint(&tuple, OldestXmin))
			{
				case HEAPTUPLE_DEAD:
				case HEAPTUPLE_RECENTLY_DEAD:
					stat.dead_tuple_len += tuple.t_len;
					stat.dead_tuple_count++;
					break;
				case HEAPTUPLE_LIVE:
					stat.tuple_len += tuple.t_len;
					stat.tuple_count++;
					break;
				case HEAPTUPLE_INSERT_IN_PROGRESS:
				case HEAPTUPLE_DELETE_IN_PROGRESS:
					break;
				default:
					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
					break;
			}
		}

		ReleaseBuffer(buf);
	}

	stat.table_len = (uint64) nblocks * BLCKSZ;
	stat.tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
											  stat.tuple_count);
	stat.total_pages = nblocks;
	stat.scanned_pages = scanned;

	if (BufferIsValid(vmbuffer))
	{
		ReleaseBuffer(vmbuffer);
		vmbuffer = InvalidBuffer;
	}

	relation_close(rel, AccessShareLock);

	return build_output_type(&stat, fcinfo);
}

/*
 * This is a copy of HeapTupleSatisfiesVacuum with all the hint-bit
 * setting code removed (and thus, some tests reversed). We use this
 * so that we don't need to hold any content locks on the buffer while
 * scanning for dead tuples. (We could use HeapTupleIsSurelyDead(), but
 * that's unnecessarily conservative.)
 *
 * See HTSV for comments.
 */

static HTSV_Result
HeapTupleSatisfiesVacuumNoHint(HeapTuple htup, TransactionId OldestXmin)
{
	HeapTupleHeader tuple = htup->t_data;
	Assert(ItemPointerIsValid(&htup->t_self));
	Assert(htup->t_tableOid != InvalidOid);

	if (!HeapTupleHeaderXminCommitted(tuple))
	{
		if (HeapTupleHeaderXminInvalid(tuple))
			return HEAPTUPLE_DEAD;
		else if (tuple->t_infomask & HEAP_MOVED_OFF)
		{
			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);

			if (TransactionIdIsCurrentTransactionId(xvac))
				return HEAPTUPLE_DELETE_IN_PROGRESS;
			if (TransactionIdIsInProgress(xvac))
				return HEAPTUPLE_DELETE_IN_PROGRESS;
			if (TransactionIdDidCommit(xvac))
			{
				return HEAPTUPLE_DEAD;
			}
		}
		else if (tuple->t_infomask & HEAP_MOVED_IN)
		{
			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);

			if (TransactionIdIsCurrentTransactionId(xvac))
				return HEAPTUPLE_INSERT_IN_PROGRESS;
			if (TransactionIdIsInProgress(xvac))
				return HEAPTUPLE_INSERT_IN_PROGRESS;
			if (!TransactionIdDidCommit(xvac))
			{
				return HEAPTUPLE_DEAD;
			}
		}
		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
		{
			if (tuple->t_infomask & HEAP_XMAX_INVALID)
				return HEAPTUPLE_INSERT_IN_PROGRESS;
			if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) ||
				HeapTupleHeaderIsOnlyLocked(tuple))
				return HEAPTUPLE_INSERT_IN_PROGRESS;
			return HEAPTUPLE_DELETE_IN_PROGRESS;
		}
		else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
		{
			return HEAPTUPLE_DEAD;
		}
	}

	if (tuple->t_infomask & HEAP_XMAX_INVALID)
		return HEAPTUPLE_LIVE;

	if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
	{
		if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
		{
			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
			{
				if ((tuple->t_infomask & (HEAP_XMAX_EXCL_LOCK |
										  HEAP_XMAX_KEYSHR_LOCK)) &&
					MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
					return HEAPTUPLE_LIVE;
			}
			else
			{
				if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
					return HEAPTUPLE_LIVE;
			}
		}

		return HEAPTUPLE_LIVE;
	}

	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
	{
		TransactionId xmax;

		if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple)))
		{
			Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));

			xmax = HeapTupleGetUpdateXid(tuple);

			Assert(TransactionIdIsValid(xmax));

			if (TransactionIdIsInProgress(xmax))
				return HEAPTUPLE_DELETE_IN_PROGRESS;
			else if (TransactionIdDidCommit(xmax))
				return HEAPTUPLE_RECENTLY_DEAD;
			return HEAPTUPLE_LIVE;
		}

		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));

		xmax = HeapTupleGetUpdateXid(tuple);

		Assert(TransactionIdIsValid(xmax));

		Assert(!TransactionIdIsInProgress(xmax));
		if (TransactionIdDidCommit(xmax))
		{
			if (!TransactionIdPrecedes(xmax, OldestXmin))
				return HEAPTUPLE_RECENTLY_DEAD;
			else
				return HEAPTUPLE_DEAD;
		}

		return HEAPTUPLE_LIVE;
	}

	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
	{
		if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
			return HEAPTUPLE_DELETE_IN_PROGRESS;
		else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
		{
			return HEAPTUPLE_LIVE;
		}
	}

	if (!TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin))
		return HEAPTUPLE_RECENTLY_DEAD;

	return HEAPTUPLE_DEAD;
}

fastbloat--1.0.sql
Description: application/sql

# fastbloat extension
comment = 'calculate bloat statistics without a full-table scan'
default_version = '1.0'
module_pathname = '$libdir/fastbloat'
relocatable = true

# contrib/fastbloat/Makefile

MODULE_big      = fastbloat
OBJS            = fastbloat.o

EXTENSION = fastbloat
DATA = fastbloat--1.0.sql

REGRESS = fastbloat

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/fastbloat
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

fastbloat
---------

This extension attempts to measure table bloat without incurring the
cost of a full-table scan.

The pgstattuple extension performs a full-table scan and returns an
exact count of live and dead tuples (and their sizes) and free space.
This code returns the same dead tuple and free space statistics along
with an estimate for the number and size of live tuples.

It does this by skipping pages that have only visible tuples according
to the visibility map (if a page has the corresponding VM bit set, then
it contains no dead tuples). For such pages, it derives the free space
value from the free space map, and assumes that the rest of the space
on the page is taken up by live tuples.

For pages that cannot be skipped, it scans each tuple, recording its
presence and size in the appropriate counters, and adding up the free
space on the page. At the end, it estimates the total number of live
tuples based on the number of pages and tuples scanned (in the same
way that VACUUM estimates pg_class.reltuples).

Here is an example comparing the output of fastbloat and pgstattuple for
the same table:

ams=# select * from fastbloat('x');
 table_len  | scanned_percent | approx_tuple_count | approx_tuple_len | 
approx_tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | 
free_space | free_percent 
------------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |              17 |           41318301 |       5483815648 |         
       81.67 |          8681708 |     1111258624 |              16.55 |   
80972912 |         1.21
(1 row)

Time: 639.455 ms

ams=# select * from pgstattuple('x');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | 
dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |    41318292 | 5288741376 |         78.76 |          8681708 |     
1111258624 |              16.55 |   91810372 |         1.37
(1 row)

Time: 15610.651 ms

In the above figures, table_len is exact (being based only on the number
of pages), as are dead_tuple_count, dead_tuple_len, dead_tuple_percent,
free_space, and free_percent.

In the fastbloat output, scanned_percent is the percentage of pages that
were scanned tuple-by-tuple (the remainder having been skipped, thanks
to the visibility map). The approximate tuple count is quite close to
the actual value (as shown by pgstattuple), but the approx_tuple_len
overestimates the size of live tuples (and thus approx_tuple_percent
is also higher than pgstattuple's calculation). The larger
scanned_percent is, the better the estimates are.

fastbloat is several times faster in the example above because it does
less work (by skipping pages and by not needing to hold content locks
on buffers as it scans other pages—see the code for details). The less
bloated the table, the less time it takes; but even in the worst case,
if all pages have dead tuples, it's marginally faster than pgstattuple.

ams=# select * from fastbloat('x');
 table_len  | scanned_percent | approx_tuple_count | approx_tuple_len | 
approx_tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | 
free_space | free_percent 
------------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |             100 |           20659150 |       2644371200 |         
       39.38 |         20659142 |     2644370176 |              39.38 | 
1203068996 |        17.92
(1 row)

Time: 8924.511 ms

ams=# select * from pgstattuple('x');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | 
dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |    20659150 | 2644371200 |         39.38 |         20659142 |     
2644370176 |              39.38 | 1203068996 |        17.92
(1 row)

Time: 13338.712 ms

(Note that the statistics from fastbloat are exact in this case, because
every page was scanned according to scanned_percent.)

Author: Abhijit Menon-Sen <a...@2ndquadrant.com>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

Reply via email to