Simon Riggs wrote:
Hmm, I think the question is: How many hint bits need to be set before we mark the buffer dirty? (N)

Should it be 1, as it is now? Should it be never? Never is a long time. As N increases, clog accesses increase. So it would seem there is likely to be an optimal value for N.

After further thought, I begin to think that the number of times we set
a dirty hint-bit shouldn't influence the decision of whether to dirty
the page too much. Instead, we should look at the *age* of the last xid
which modified the tuple. The idea is that the clog pages showing the
status of "young" xids are far more likely to be cached that the pages
for "older" xids. This makes a lost hint-bit update much cheaper for
young than for old xids, because we probably won't waste any IO if we
have to set the hint-bit again later, because the buffer was evicted
from shared_buffers before being written out. Additionally, I think we
should put some randomness into the decision, to spread the IO caused by
hit-bit updates after a batch load.

All in all, I envision a formula like
chance_of_dirtying = min(1,
  alpha
  *floor((next_xid - last_modifying_xid)/clog_page_size)
  /clog_buffers
)

This means that a hint-bit update never triggers dirtying if the last
modifying xid belongs to the same clog page as the next unused xid -
which sounds good, since that clog page gets touched on every commit and
abort, and therefore is cached nearly for sure.

For xids on older pages, the chance of dirtying grows (more aggresivly
for larger alpha values). For alpha = 1, a hint-bit update dirties a
buffer for sure only if the xid is older than clog_page_size*clog_buffers.

regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to