On Fri, Jun 17, 2011 at 01:21:17PM -0400, Robert Haas wrote: > On Thu, Jun 16, 2011 at 11:17 PM, Noah Misch <n...@leadboat.com> wrote: > > I suggest revisiting the suggestion in > > http://archives.postgresql.org/message-id/27743.1291135...@sss.pgh.pa.us and > > including a latestRemovedXid in xl_heap_visible. ?The range of risky xids is > > the same for setting a visibility map bit as for deleting a dead tuple, and > > the same operation (VACUUM) produces both conflicts. > > See Heikki's follow-up email. The standby has to ignore all-visible > bits anyway, because the fact that a transaction is all-visible on the > master does not imply that it is all-visible on the standby. So I > don't think there's a problem here.
Indeed. If we conflicted on the xmin of the most-recent all-visible tuple when setting the bit, all-visible on the standby would become a superset of all-visible on the master, and we could cease to ignore the bits. This is an excellent trade-off for masters that do UPDATE and DELETE, because they're already conflicting (or using avoidance measures) on similar xids at VACUUM time. (Some INSERT-only masters will disfavor the trade-off, but we could placate them with a GUC. On the other hand, hot_standby_feedback works and costs an INSERT-only master so little.) > > I proceeded to do some immediate-shutdown tests to see if we get the bits > > set > > as expected. ?I set up a database like this: > > ? ? ? ?CREATE EXTENSION pageinspect; > > ? ? ? ?CREATE TABLE t (c int); > > ? ? ? ?INSERT INTO t VALUES (2); > > ? ? ? ?CHECKPOINT; > > I normally cleared bits with "UPDATE t SET c = 1; CHECKPOINT;" and set them > > with "VACUUM t". ?I checked bits with this query: > > ? ? ? ?SELECT to_hex(get_byte(get_raw_page('t', 'vm', 0), 24)), > > ? ? ? ? ? ? ? to_hex(flags::int) FROM page_header(get_raw_page('t', 0)); > > The row from that query should generally be 0,1 (bits unset) or 1,5 (bits > > set). ?0,5 is fine after a crash. ?1,1 means we've broken our contract: the > > VM > > bit is set while PD_ALL_VISIBLE is not set. > > > > First test was to clear bits, checkpoint, then VACUUM and SIGQUIT the > > postmaster. ?The system came back up with 1/1 bits. ?I poked around enough > > to > > see that XLByteLE(lsn, PageGetLSN(page)) was failing, but I did not dig any > > deeper toward a root cause. ?If you have trouble reproducing this, let me > > know > > so I can assemble a complete, self-contained test case. > > Thank you for putting these test cases together. As you can probably > tell, I was having difficulty figuring out exactly how to test this. No problem. I ran these test cases with the new patch: -Description- -Expected bits- -Result- set bit, VACUUM commit, crash 1,5 1,5 set bit, crash 0,1 0,1 set bit, flush heap page, crash 0,5 0,5 set bit, flush vm page, crash 1,5 1,5 xlog flush request locations look correct, too. Overall, looking good. Do you know of other notable cases to check? The last two were somewhat tricky to inject; if anyone wishes to reproduce them, I'm happy to share my recipe. One more thing came to mind: don't we need a critical section inside the "if" block in visibilitymap_set()? I can't see anything in there that could elog(ERROR, ...), but my general impression is that we use one anyway. I ran one repetition of my benchmark from last report, and the amount of WAL has not changed. Given that, I think the previous runs remain valid. > I think that the problem here is that the sense of that test is > exactly backwards from what it should be. IIUC, the word "precedes" > in the previous comment should in fact say "follows". Ah; quite so. > >> + ? ? ? ? ? ? /* Don't set the bit if replay has already passed this > >> point. */ > >> + ? ? ? ? ? ? if (XLByteLE(lsn, PageGetLSN(BufferGetPage(vmbuffer)))) > >> + ? ? ? ? ? ? ? ? ? ? visibilitymap_set(reln, xlrec->block, lsn, > >> &vmbuffer); > > > > ... wouldn't it be better to do this unconditionally? ?Some later record > > will > > unset it if needed, because all replay functions that clear the bit do so > > without consulting the vm page LSN. ?On the other hand, the worst > > consequence > > of the way you've done it is VACUUM visiting the page one more time to set > > the > > bit. > > Hmm, now that I look at it, I think this test is backwards too. I > think you might be right that it wouldn't hurt anything to go ahead > and set it, but I'm inclined to leave it in for now. Okay. Consider expanding the comment to note that this is out of conservatism rather than known necessity. > > I think it's worth noting, perhaps based on your explanation in the > > second-to-last paragraph of > > http://archives.postgresql.org/message-id/BANLkTi=b7jvmq6fa_exlcygzuyv1u9a...@mail.gmail.com > > that the answer may be incorrect again after the recheck. ?We don't care: > > redundant clearings of the visibility bit are no problem. > > I added a comment. See what you think. That should help. > > Several lines above are inconsistent on internal tabs/spaces. > > Fixed... I hope. Probably not worth mentioning, but there remains one space instead of one tab after "visibilitymap_clear" in this line: * visibilitymap_clear - clear a bit in the visibility map FWIW, I just do C-s TAB in emacs and then scan for places where the boundaries of the highlighted regions fail to line up vertically. > > This could just as well be an Assert. > > Maybe, but I'm going to leave it the way it is for now. It's not the > only place in the code where we do stuff like that. The cost of > testing it in a non-assert-enabled build is quite small. Okay. > > This fills RelationGetBufferForTuple()'s needs, but the name doesn't seem to > > fit the task too well. ?This function doesn't check the pin or that the > > buffer > > applies to the correct relation. ?Consider naming it something like > > `visibilitymap_buffer_covers_block'. > > I think that this may be related to the fact that visibilitymap_pin() > doesn't do what you might expect. But that name is pre-existing, so I > kept it, and I don't really want this to be something that sounds > totally unrelated. Ah, so visibilitymap_pin_ok() answers the question "Will visibilitymap_pin() be a long-winded no-op given these arguments?" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers