Hello, At Tue, 28 Mar 2017 08:50:58 -0700, Jeff Janes <jeff.ja...@gmail.com> wrote in <CAMkU=1zkfqgepwg+qqkthmwerbn8uaa2_9sb+qtuurehfkq...@mail.gmail.com> > > > I now think this is not the cause of the problem I am seeing. I made the > > > replay of FREEZE_PAGE update the FSM (both with and without FPI), but > > that > > > did not fix it. With frequent crashes, it still accumulated a lot of > > > frozen and empty (but full according to FSM) pages. I also set up > > replica > > > streaming and turned off crashing on the master, and the FSM of the > > replica > > > stays accurate, so the WAL stream and replay logic is doing the right > > thing > > > on the replica. > > > > > > I now think the dirtied FSM pages are somehow not getting marked as > > dirty, > > > or are getting marked as dirty but somehow the checkpoint is skipping > > > them. It looks like MarkBufferDirtyHint does do some operations unlocked > > > which could explain lost update, but it seems unlikely that that would > > > happen often enough to see the amount of lost updates I am seeing. > > > > Hmm.. clearing dirty hint seems already protected by exclusive > > lock. And I think it can occur without lock failure. > > > > Other than by FPI, FSM update is omitted when record LSN is older > > than page LSN. If heap page is evicted but FSM page is not after > > vacuuming and before power cut, replaying HEAP2_CLEAN skips > > update of FSM even though FPI is not attached. Of course this > > cannot occur on standby. One FSM page covers as many heap pages > > as about 4k, so FSM can stay far longer than heap pages. > > > > This corresponds to action == BLK_DONE case, right?
Yes. WAL with older LSN results in BLK_DONE. It works as long as heap page and FSM are consistent but leaves FSM broken during crach-recovery for the situation. > > ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page > > is already empty when entering lazy_sacn_heap, or a page of > > non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is > > issued to set ALL_FROZEN. > > > > Perhaps the problem will be fixed by forcing heap_xlog_visible to > > update FSM (addition to FREEZE_PAGE), or the same in > > heap_xlog_clean. (As menthined in the previous mail, I prefer the > > latter.) > > > > When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on > BLK_DONE), it solves the problem I was seeing. Which still leaves me > wondering why the problem doesn't show up on the standby because, unlike > BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on > a recovering master, shouldn't it? Maybe the difference is that the > existence a replication slot delays the clean up in a way that causes a > different pattern of WAL records. While all WAL records are new to target page during standby recovery, several WAL records at the beginning can be old in a crash-recovery. > > > > > /* > > > > > * Update the FSM as well. > > > > > * > > > > > * XXX: Don't do this if the page was restored from full page image. > > We > > > > > * don't bother to update the FSM in that case, it doesn't need to be > > > > > * totally accurate anyway. > > > > > */ > > > > > > > > > > What does that save us? If we restored from FPI, we already have the > > block > > > in memory (we don't need to see the old version, just the new one), so it > > > doesn't save us a random read IO. > > > > Updates on random pages can cause visits to many unloaded FSM > > pages. It may be intending to avoid that. > > > But I think that that would be no worse for BLK_RESTORED than it is for > BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth > optimizing either one? I agree with you. FPI increases and descreases free space just the same as redoing WAL record. The following is the discussion about that. https://www.postgresql.org/message-id/49072021.7010801%40enterprisedb.com https://www.postgresql.org/message-id/24334.1225205478%40sss.pgh.pa.us Tom Lane wrote: > Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes: > > One issue with this patch is that it doesn't update the FSM at all when > > pages are restored from full page images. It would require fetching the > > page and checking the free space on it, or peeking into the size of the > > backup block data, and I'm not sure if it's worth the extra code to do that. > > I'd vote not to bother, at least not in the first cut. As you say, 100% > accuracy isn't required, and I think that in typical scenarios an > insert/update that causes a page to become full would be relatively less > likely to have a full-page image. So, the reason seems to be that it just doesn't seem necessary. Including another branch of this thread, the following options are proposed. - Let FREEZE_PAGE and VISIBLE update FSM. This causes extra fetch of a heap page, summing up of free space and FSM update for every frozen pages. - Let CLEAN always update FSM. This causes extra counting of free space and FSM update for every vacuuming of heap pages regardless of frozen-ness. - Let FREEZE_PAGE/VISIBLE or CLEAN records have free space. This doesn't need to fetch a heap page. But breaks the policy (really?) that FSM is not WAL-logged, or that FSM is updated just as the result of heap udpates. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers