On 2011-03-17 15:02, Robert Haas wrote:
On Thu, Mar 17, 2011 at 4:17 AM, Jesper Krogh<jes...@krogh.cc>  wrote:
Is it obvious that the visibillity map bits should track complete
pages and not individual tuples? If the visibillity map tracks at
page-level the benefit would fall on "slim tables" where you squeeze
200 tuples into each page and having an update rate of 1% would
lower the likelyhood even more. (it may be that for slim tables the
index-only-scans are not as benefitial as to wide tables).
I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but
say it's 200.  If you track visibility info per tuple rather than per
page, then the size of the visibility map is going to expand by a
factor of 200.  That might decrease contention, but otherwise it's a
bad thing - the whole point of having the visibility map in the first
place is that it's much, much smaller than the heap.  If it were the
same size as the heap, we could just read the heap.  What the map
attempts to accomplish is to allow us, by reading a small number of
pages, to check whether the tuples we're thinking of reading are
likely to be all-visible without actually looking at them.
Yes, that was sort of the math I was trying to make. I do allthough
belive that you have a way better feeling about it. But according
to this: http://wiki.postgresql.org/wiki/FAQ#How_much_database_disk_space_is_required_to_store_data_from_a_typical_text_file.3F The bulk row-overhead is around 24bytes, which will with 1 bit per row give a size reduction of 1:(24x8) ~1:192, worstcase... that gives at best 341 tuples/page
(where each tuple, does not contain any data at all). With that ratio, the
visibillitymap of a relation of 10GB would fill 52MB on disk (still worst case)
and that by itself would by all means be awesome. (with that small tuples a
10GB relation would have around 42 billion tuples).

On the 1 bit per page the "best case" would be 341 times better than above
reducing the size of the visibiility map on a 10GB table to around 152KB which
is extremely small (and thus also awesome) But the consequenses of a single
update would mean that you loose visibilllity map benefit on 341 tuples in
one shot.

Worst case situations are, where we approach the 4 tuples per page, before
we hit toast where the ratio of space reduction in 1 bit per tuple would be:
1:(2048x8) ~ 1:16384 and the 1 bit per page is 4 times better.
In the 1 bit per tuple a visibillity map of a 10GB relation would be around 610KB
1 bit per page would then drop it to around 160KB.


Can we drag out some average-case numbers on row-size in the heap
from some real production systems?

I may have gotten something hugely wrong in above calculations and/or
have missed some important points.

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to