On Fri, Jun 24, 2011 at 1:43 PM, Jeff Davis <pg...@j-davis.com> wrote: >> And anything that is WAL-logged must obey the WAL-before-data rule. >> We have a system already that ensures that when >> synchronous_commit=off, CLOG pages can't be flushed before the >> corresponding WAL record makes it to disk. > > In this case, how do you prevent the PD_ALL_VISIBLE from making it to > disk if you never bumped the LSN when it was set? It seems like you just > don't have the information to do so, and it seems like the information > required would be variable in size.
Well, I think that would be a problem for the hypothetical implementer of the persistent snapshot feature. :-) More seriously, Heikki and I previously discussed creating some systematic method for suppressing FPIs when they are not needed, perhaps by using a bit in the page header to indicate whether an FPI has been generated since the last checkpoint. I think it would be nice to have such a system, but since we don't have a clear agreement either that it's a good idea or what we'd do after that, I'm not inclined to invest time in it. To really get any benefit out of a change in that area, we'd need probably need to (a) remove the LSN interlocks that prevent changes from being replayed if the LSN of the page has already advanced beyond the record LSN and (b) change at least some of XLOG_HEAP_{INSERT,UPDATE,DELETE} to be idempotent. But if we went in that direction then that might help to regularize some of this and make it a bit less ad-hoc. > I didn't mean to make this conversation quite so hypothetical. My > primary points are: > > 1. Sometimes it makes sense to break the typical WAL conventions for > performance reasons. But when we do so, we have to be quite careful, > because things get complicated quickly. Yes. > 2. PD_ALL_VISIBLE is a little bit more complex than other hint bits, > because the conditions under which it may be set are more complex > (having to do with both snapshots and cleanup actions). Other hint bits > are based only on transaction status: either the WAL for that > transaction completion got flushed (and is therefore permanent), and we > set the hint bit; or it didn't get flushed and we don't. I think the term "hint bits" really shouldn't be applied to anything other than HEAP_{XMIN,XMAX}_{COMMITTED,INVALID}. Otherwise, we get into confusing territory pretty quickly. Our algorithm for opportunistically killing index entries pointing to dead tuples is not WAL-logged, but it involves more than a single bit. OTOH, clearing of the PD_ALL_VISIBLE bit has always been WAL-logged, so lumping that in with HEAP_XMIN_COMMITTED is pretty misleading. > Just having this discussion has been good enough for me to get a better > idea what's going on, so if you think the comments are sufficient that's > OK with me. I'm not 100% certain they are, but let's wait and see if anyone else wants to weigh in... please do understand I'm not trying to be a pain in the neck. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers