On Thu, Apr 3, 2014 at 3:11 AM, Abhijit Menon-Sen <a...@2ndquadrant.com> wrote: > > This is a follow-up to the thread at > http://www.postgresql.org/message-id/4eb5fa1b.1090...@2ndquadrant.com > > A quick summary: that thread proposed adding a relation_free_space() > function to the pageinspect extension.
> Various review comments were > received, among which was the suggestion that the code belonged in > pg_stattuple as a faster way to calculate free_percent. You haven't mentioned why you didn't follow that way. After looking at code, I also felt that it is better to add this as a version of pg_stattuple. > === > > I've attached an extension that produces largely pgstattuple-compatible > numbers for a table without doing a full-table scan. > > It scans through the table, skipping blocks that have their visibility > map bit set. For such pages, it gets the free space from the free space > map, and assumes that all remaining space on the page is taken by live > tuples. It scans other pages tuple-by-tuple and counts live and dead > tuples and free space. Is this assumption based on the reason that if the visibility map bit of page is set, then there is high chance that vacuum would have pruned the dead tuples and updated FSM with freespace? In anycase, I think it will be better if you update README and or code comments to mention the reason of such an assumption. 1. compilation errors 1>contrib\fastbloat\fastbloat.c(450): error C2198: 'MultiXactIdIsRunning' : too few arguments for call 1>contrib\fastbloat\fastbloat.c(467): error C2198: 'MultiXactIdIsRunning' : too few arguments for call Recent commit 05315 added new parameter to this API, so this usage of API needs to be updated accordingly. 2. /* Returns a tuple with live/dead tuple statistics for the named table. */ I think this is not a proper multi-line comment. 3. fbstat_heap() { .. for (blkno = 0; blkno < nblocks; blkno++) { .. } It is better to have CHECK_FOR_INTERRUPTS() in above loop. 4. if (PageIsNew(page)) { ReleaseBuffer(buf); continue; } Incase of new page don't you need to account for freespace. Why can't we safely assume that all the space in page is free and add it to freespace. 5. Don't we need the handling for empty page (PageIsEmpty()) case? 6. ItemIdIsDead(itemid)) { continue; } What is the reason for not counting ItemIdIsDead() case for accounting of dead tuples. I think for Vaccum, it is okay to skip that case because same is counted via heap_page_prune() call. 7. HeapTupleSatisfiesVacuumNoHint() a. Why can't we use HeapTupleSatisfiesVisibility() with dirty snapshot as we use for pgstattuple? I think it will have the advantage of keeping the code for fastbloat similar to pgstattuple. b. If we want to use HeapTupleSatisfiesVacuum(), why can't we add one parameter to function which can avoid setting of hintbit. Are you worried about the performance impact it might cause in other flows, if yes then which flow your are mainly worried. c. I agree that there is advantage of avoiding hintbit code, but as this operation will not be frequent, so not sure if its good idea to add a separate version of tuplevisibility function In general, I think the main advantage of this patch comes from the fact that it can skip scanning pages based on visibilitymap, so it seems to me that it is better if we keep other part of code similar to pgstattuple. 8. HeapTupleSatisfiesVacuumNoHint(HeapTuple htup, TransactionId OldestXmin) There is some new code added in corresponding function HeapTupleSatisfiesVacuum(), so if we want to use it, then update the changes in this function as well and I think it is better to move this into tqual.c along with other similar functions. 9. * This routine is shared by VACUUM and ANALYZE. */ double vac_estimate_reltuples(Relation relation, bool is_analyze, BlockNumber total_pages, BlockNumber scanned_pages, double scanned_tuples) function header of vac_estimate_reltuples() suggests that it is used by VACUUM and ANALYZE, I think it will be better to update the comment w.r.t new usage as well. 10. I think it should generate resource file as is done for other modules if you want to keep it as a separate module. Example: MODULE_big = btree_gin OBJS = btree_gin.o $(WIN32RES) 11. README.txt > Here is an example comparing the output of fastbloat and pgstattuple for the same table: Do you really need to specify examples in README. I think it would be better to specify examples in documentation (.sgml). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com