Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

Tomas Vondra Sun, 22 Feb 2015 17:42:52 -0800

Hi!

On 28.1.2015 05:03, Abhijit Menon-Sen wrote:
> At 2015-01-27 17:00:27 -0600, [email protected] wrote:
>>
>> It would be best to get this into a commit fest so it's not lost.
> 
> It's there already.
> 
> -- Abhijit
> 
>


I looked at this patch today, so a few comments from me:

1) I believe the patch is slightly broken, because the version was
   bumped to 1.3 but only partially, so I get this on "make install"

    $ make -j9 -s install
    /usr/bin/install: cannot stat ‘./pgstattuple--1.3.sql’: No such
    file or directory
    ../../src/makefiles/pgxs.mk:129: recipe for target 'install' failed
    make[1]: *** [install] Error 1
    Makefile:85: recipe for target 'install-pgstattuple-recurse' failed
    make: *** [install-pgstattuple-recurse] Error 2
    make: *** Waiting for unfinished jobs....

   The problem seems to be that Makefile already lists --1.3.sql in the
   DATA section, but the file was not renamed (there's just --1.2.sql).

   After fixing that, it's also necessary to fix the version in the
   control file, otherwise the CREATE EXTENSION fails.

      default_version = '1.2' -> '1.3'

   At least that fixed the install for me.


2) Returning Datum from fbstat_heap, which is a static function seems a
   bit strange to me, I'd use the HeapTuple directly, but meh ...


3) The way the tuple is built by first turning the values into strings
   and then turned back into Datums by calling BuildTupleFromCStrings
   is more serious I guess. Why not to just keep the Datums and call
   heap_form_tuple directly?

   I see the other functions in pgstattuple.c do use the string way, so
   maybe there's a reason for that? Or is this just to keep the code
   consistent in the extension?

4) I really doubt 'fastbloat' is a good name. I propose 'pgstatbloat'
   to make that consistent with the rest of pgstattuple functions. Or
   something similar, but 'fastbloat' is just plain confusing.


Otherwise, the code looks OK to me. Now, there are a few features I'd
like to have for production use (to minimize the impact):

1) no index support :-(

   I'd like to see support for more relation types (at least btree
   indexes). Are there any plans for that? Do we have an idea on how to
   compute that?

2) sampling just a portion of the table

   For example, being able to sample just 5% of blocks, making it less
   obtrusive, especially on huge tables. Interestingly, there's a
   TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
   of the methods (e.g. functions behind SYSTEM sampling)?

3) throttling

   Another feature minimizing impact of running this on production might
   be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
   or something along those lines.

4) prefetch

   fbstat_heap is using visibility map to skip fully-visible pages,
   which is nice, but if we skip too many pages it breaks readahead
   similarly to bitmap heap scan. I believe this is another place where
   effective_io_concurrency (i.e. prefetch) would be appropriate.

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

Reply via email to