Hi! On 28.1.2015 05:03, Abhijit Menon-Sen wrote: > At 2015-01-27 17:00:27 -0600, jim.na...@bluetreble.com wrote: >> >> It would be best to get this into a commit fest so it's not lost. > > It's there already. > > -- Abhijit > >
I looked at this patch today, so a few comments from me: 1) I believe the patch is slightly broken, because the version was bumped to 1.3 but only partially, so I get this on "make install" $ make -j9 -s install /usr/bin/install: cannot stat ‘./pgstattuple--1.3.sql’: No such file or directory ../../src/makefiles/pgxs.mk:129: recipe for target 'install' failed make[1]: *** [install] Error 1 Makefile:85: recipe for target 'install-pgstattuple-recurse' failed make: *** [install-pgstattuple-recurse] Error 2 make: *** Waiting for unfinished jobs.... The problem seems to be that Makefile already lists --1.3.sql in the DATA section, but the file was not renamed (there's just --1.2.sql). After fixing that, it's also necessary to fix the version in the control file, otherwise the CREATE EXTENSION fails. default_version = '1.2' -> '1.3' At least that fixed the install for me. 2) Returning Datum from fbstat_heap, which is a static function seems a bit strange to me, I'd use the HeapTuple directly, but meh ... 3) The way the tuple is built by first turning the values into strings and then turned back into Datums by calling BuildTupleFromCStrings is more serious I guess. Why not to just keep the Datums and call heap_form_tuple directly? I see the other functions in pgstattuple.c do use the string way, so maybe there's a reason for that? Or is this just to keep the code consistent in the extension? 4) I really doubt 'fastbloat' is a good name. I propose 'pgstatbloat' to make that consistent with the rest of pgstattuple functions. Or something similar, but 'fastbloat' is just plain confusing. Otherwise, the code looks OK to me. Now, there are a few features I'd like to have for production use (to minimize the impact): 1) no index support :-( I'd like to see support for more relation types (at least btree indexes). Are there any plans for that? Do we have an idea on how to compute that? 2) sampling just a portion of the table For example, being able to sample just 5% of blocks, making it less obtrusive, especially on huge tables. Interestingly, there's a TABLESAMPLE patch in this CF, so maybe it's possible to reuse some of the methods (e.g. functions behind SYSTEM sampling)? 3) throttling Another feature minimizing impact of running this on production might be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s' or something along those lines. 4) prefetch fbstat_heap is using visibility map to skip fully-visible pages, which is nice, but if we skip too many pages it breaks readahead similarly to bitmap heap scan. I believe this is another place where effective_io_concurrency (i.e. prefetch) would be appropriate. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers