On Fri, Jun 24, 2011 at 1:43 PM, Jeff Davis <pg...@j-davis.com> wrote:
>> And anything that is WAL-logged must obey the WAL-before-data rule.
>> We have a system already that ensures that when
>> synchronous_commit=off, CLOG pages can't be flushed before the
>> corresponding WAL record makes it to disk.
>
> In this case, how do you prevent the PD_ALL_VISIBLE from making it to
> disk if you never bumped the LSN when it was set? It seems like you just
> don't have the information to do so, and it seems like the information
> required would be variable in size.

Well, I think that would be a problem for the hypothetical implementer
of the persistent snapshot feature.  :-)

More seriously, Heikki and I previously discussed creating some
systematic method for suppressing FPIs when they are not needed,
perhaps by using a bit in the page header to indicate whether an FPI
has been generated since the last checkpoint.  I think it would be
nice to have such a system, but since we don't have a clear agreement
either that it's a good idea or what we'd do after that, I'm not
inclined to invest time in it.  To really get any benefit out of a
change in that area, we'd need probably need to (a) remove the LSN
interlocks that prevent changes from being replayed if the LSN of the
page has already advanced beyond the record LSN and (b) change at
least some of XLOG_HEAP_{INSERT,UPDATE,DELETE} to be idempotent.  But
if we went in that direction then that might help to regularize some
of this and make it a bit less ad-hoc.

> I didn't mean to make this conversation quite so hypothetical. My
> primary points are:
>
> 1. Sometimes it makes sense to break the typical WAL conventions for
> performance reasons. But when we do so, we have to be quite careful,
> because things get complicated quickly.

Yes.

> 2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
> because the conditions under which it may be set are more complex
> (having to do with both snapshots and cleanup actions). Other hint bits
> are based only on transaction status: either the WAL for that
> transaction completion got flushed (and is therefore permanent), and we
> set the hint bit; or it didn't get flushed and we don't.

I think the term "hint bits" really shouldn't be applied to anything
other than HEAP_{XMIN,XMAX}_{COMMITTED,INVALID}.  Otherwise, we get
into confusing territory pretty quickly.  Our algorithm for
opportunistically killing index entries pointing to dead tuples is not
WAL-logged, but it involves more than a single bit.  OTOH, clearing of
the PD_ALL_VISIBLE bit has always been WAL-logged, so lumping that in
with HEAP_XMIN_COMMITTED is pretty misleading.

> Just having this discussion has been good enough for me to get a better
> idea what's going on, so if you think the comments are sufficient that's
> OK with me.

I'm not 100% certain they are, but let's wait and see if anyone else
wants to weigh in...  please do understand I'm not trying to be a pain
in the neck.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to