Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

Andres Freund Fri, 08 May 2015 17:21:52 -0700

Hi,

On 2015-04-24 08:46:48 +0530, Abhijit Menon-Sen wrote:
> diff --git a/contrib/pgstattuple/pgstatbloat.c 
> b/contrib/pgstattuple/pgstatbloat.c
> new file mode 100644
> index 0000000..612e22b
> --- /dev/null
> +++ b/contrib/pgstattuple/pgstatbloat.c
> @@ -0,0 +1,389 @@
> +/*
> + * contrib/pgstattuple/pgstatbloat.c
> + *
> + * Abhijit Menon-Sen <[email protected]>
> + * Portions Copyright (c) 2001,2002  Tatsuo Ishii (from pgstattuple)


I think for new extension we don't really add authors and such anymore.

> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */

Shouldn't be here in a contrib module.

> +PG_FUNCTION_INFO_V1(pgstatbloat);

> +CREATE FUNCTION pgstatbloat(IN reloid regclass,
> +    OUT table_len BIGINT,               -- physical table length in bytes
> +    OUT scanned_percent FLOAT8,         -- what percentage of the table's 
> pages was scanned
> +    OUT approx_tuple_count BIGINT,      -- estimated number of live tuples
> +    OUT approx_tuple_len BIGINT,        -- estimated total length in bytes 
> of live tuples
> +    OUT approx_tuple_percent FLOAT8,    -- live tuples in % (based on 
> estimate)
> +    OUT dead_tuple_count BIGINT,        -- exact number of dead tuples
> +    OUT dead_tuple_len BIGINT,          -- exact total length in bytes of 
> dead tuples
> +    OUT dead_tuple_percent FLOAT8,      -- dead tuples in % (based on 
> estimate)
> +    OUT free_space BIGINT,              -- exact free space in bytes
> +    OUT free_percent FLOAT8)            -- free space in %

Hm. I do wonder if this should really be called 'statbloat'. Perhaps
it'd more appropriately be called pg_estimate_bloat or somesuch?

Also, is free_space really exact? The fsm isn't byte exact.

> +static Datum
> +build_output_type(pgstatbloat_output_type *stat, FunctionCallInfo fcinfo)
> +{
> +#define NCOLUMNS     10
> +#define NCHARS               32
> +
> +     HeapTuple       tuple;
> +     char       *values[NCOLUMNS];
> +     char            values_buf[NCOLUMNS][NCHARS];
> +     int                     i;
> +     double          tuple_percent;
> +     double          dead_tuple_percent;
> +     double          free_percent;   /* free/reusable space in % */
> +     double          scanned_percent;
> +     TupleDesc       tupdesc;
> +     AttInMetadata *attinmeta;
> +
> +     /* Build a tuple descriptor for our result type */
> +     if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
> +             elog(ERROR, "return type must be a row type");
> +
> +     /*
> +      * Generate attribute metadata needed later to produce tuples from raw C
> +      * strings
> +      */
> +     attinmeta = TupleDescGetAttInMetadata(tupdesc);
> +
> +     if (stat->table_len == 0)
> +     {
> +             tuple_percent = 0.0;
> +             dead_tuple_percent = 0.0;
> +             free_percent = 0.0;
> +     }
> +     else
> +     {
> +             tuple_percent = 100.0 * stat->tuple_len / stat->table_len;
> +             dead_tuple_percent = 100.0 * stat->dead_tuple_len / 
> stat->table_len;
> +             free_percent = 100.0 * stat->free_space / stat->table_len;
> +     }
> +
> +     scanned_percent = 0.0;
> +     if (stat->total_pages != 0)
> +     {
> +             scanned_percent = 100 * stat->scanned_pages / stat->total_pages;
> +     }
> +
> +     for (i = 0; i < NCOLUMNS; i++)
> +             values[i] = values_buf[i];
> +     i = 0;
> +     snprintf(values[i++], NCHARS, INT64_FORMAT, stat->table_len);
> +     snprintf(values[i++], NCHARS, "%.2f", scanned_percent);
> +     snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_count);
> +     snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_len);
> +     snprintf(values[i++], NCHARS, "%.2f", tuple_percent);
> +     snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_count);
> +     snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_len);
> +     snprintf(values[i++], NCHARS, "%.2f", dead_tuple_percent);
> +     snprintf(values[i++], NCHARS, INT64_FORMAT, stat->free_space);
> +     snprintf(values[i++], NCHARS, "%.2f", free_percent);
> +
> +     tuple = BuildTupleFromCStrings(attinmeta, values);
> +
> +     return HeapTupleGetDatum(tuple);
> +}

Why go through C strings?  I personally think we really shouldn't add
more callers to BuildTupleFromCStrings, it's such an absurd interface.

> +     switch (rel->rd_rel->relkind)
> +     {
> +             case RELKIND_RELATION:
> +             case RELKIND_MATVIEW:
> +                     PG_RETURN_DATUM(pgstatbloat_heap(rel, fcinfo));
> +             case RELKIND_TOASTVALUE:
> +                     err = "toast value";
> +                     break;
> +             case RELKIND_SEQUENCE:
> +                     err = "sequence";
> +                     break;
> +             case RELKIND_INDEX:
> +                     err = "index";
> +                     break;
> +             case RELKIND_VIEW:
> +                     err = "view";
> +                     break;
> +             case RELKIND_COMPOSITE_TYPE:
> +                     err = "composite type";
> +                     break;
> +             case RELKIND_FOREIGN_TABLE:
> +                     err = "foreign table";
> +                     break;
> +             default:
> +                     err = "unknown";
> +                     break;
> +     }
>

This breaks translateability. I'm not sure that's worthy of concern. I
think it'd actually be fine to just say that the relation has to be a
table or matview.

> +static Datum
> +pgstatbloat_heap(Relation rel, FunctionCallInfo fcinfo)
> +{
> +     /*
> +      * We use the visibility map to skip over SKIP_PAGES_THRESHOLD or
> +      * more contiguous all-visible pages. See the explanation in
> +      * lazy_scan_heap for the rationale.
> +      */

I don't believe that rationale is really true these days. I'd much
rather just rip this out here than copy the rather complex logic.

> +     for (blkno = 0; blkno < nblocks; blkno++)
> +     {

> +             stat.free_space += PageGetHeapFreeSpace(page);
> +
> +             if (PageIsNew(page) || PageIsEmpty(page))
> +             {
> +                     UnlockReleaseBuffer(buf);
> +                     continue;
> +             }

I haven't checked, but I'm not sure that it's safe/meaningful to call
PageGetHeapFreeSpace() on a new page.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

Reply via email to