On Tue, Jan 24, 2012 at 11:24:08AM -0500, Jaime Casanova wrote: > On Mon, Jan 23, 2012 at 7:18 PM, Noah Misch <n...@leadboat.com> wrote: > > If someone feels like > > doing it, +1 for making pgstattuple() count non-leaf free space. > > actually i agreed that non-leaf pages are irrelevant... i just > confirmed that in a production system with 300GB none of the indexes > in an 84M rows table nor in a heavily updated one has more than 1 root > page, all the rest are deleted, half_dead or leaf. so the posibility > of bloat coming from non-leaf pages seems very odd
FWIW, the number to look at is internal_pages from pgstatindex(): [local] test=# create table t4 (c) as select * from generate_series(1,1000000); SELECT 1000000 [local] test=# alter table t4 add primary key(c); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "t4_pkey" for table "t4" ALTER TABLE [local] test=# select * from pgstatindex('t4_pkey'); -[ RECORD 1 ]------+--------- version | 2 tree_level | 2 index_size | 22478848 root_block_no | 290 internal_pages | 10 leaf_pages | 2733 empty_pages | 0 deleted_pages | 0 avg_leaf_density | 90.06 leaf_fragmentation | 0 So, 0.4% of this index. They appear in proportion to the logarithm of the total index size. I agree that bloat centered on them is unlikely. Counting them would be justified, but that is a question of formal accuracy rather than practical importance. > but the possibility of bloat coming from the meta page doesn't exist, > AFAIUI at least > > we need the most accurate value about usable free space, because the > idea is to add a sampler mode to the function so we don't scan the > whole relation. that's why we still need the function. I doubt we'd add this function solely on the basis that a future improvement will make it useful. For the patch to go in now, it needs to be useful now. (This is not a universal principle, but it mostly holds for low-complexity patches like this one.) All my comments below would also apply to such a broader patch. > btw... pgstattuple also has the problem that it's not using a ring buffer > > > attached are two patches: > - v5: is the same original patch but only track space in leaf, deleted > and half_dead pages > - v5.1: adds the same for all kind of indexes (problem is that this is > inconsistent with the fact that pageinspect only manages btree indexes > for everything else) Let's take a step back. Again, what you're proposing is essentially a faster implementation of "SELECT free_percent FROM pgstattuple(rel)". If this code belongs in core at all, it belongs in the pgstattuple module. Share as much infrastructure as is reasonable between the user-visible functions of that module. For example, I'm suspecting that the pgstat_index() call tree should be shared, with pgstat_index_page() checking a flag to decide whether to gather per-tuple stats. Next, compare the bits of code that differ between pgstattuple() and relation_free_space(), convincing yourself that the differences are justified. Each difference will yield one of the following conclusions: 1. Your code contains an innovation that would apply to both functions. Where not too difficult, merge these improvements into pgstattuple(). In order for a demonstration of your new code's better performance to be interesting, we must fix the same low-hanging fruit in its competitor. One example is the use of the bulk read strategy. Another is the support for SP-GiST. 2. Your code is missing an essential behavior of pgstattuple(). Add it to your code. One example is the presence of CHECK_FOR_INTERRUPTS() calls. 3. Your code behaves differently from pgstattuple() due to a fundamental difference in their tasks. These are the only functional differences that ought to remain in your finished patch; please point them out in comments. For example, pgstat_heap() visits every tuple in the heap. You'll have no reason to do that; pgstattuple() only needs it to calculate statistics other than free_percent. In particular, I call your attention to the fact that pgstattuple() takes shared buffer content locks before examining pages. Your proposed patch does not do so. I do not know with certainty whether that falls under #1 or #2. The broad convention is to take such locks, because we elsewhere want an exact answer. These functions are already inexact; they make no effort to observe a consistent snapshot of the table. If you convince yourself that the error arising from not locking buffers is reasonably bounded, we can lose the locks (in both functions -- conclusion #1). Comments would then be in order. With all that done, run some quick benchmarks: see how "SELECT free_percent FROM pgstattuple(rel)" fares compared to "SELECT relation_free_space(rel)" for a large heap and for a large B-tree index. If the timing difference is too small to be interesting to you, remove relation_free_space() and submit your pgstattuple() improvements alone. Otherwise, submit as written. I suppose I didn't make all of this clear enough earlier; sorry for that. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers