On Wed, Mar 28, 2018 at 1:21 AM, Pavan Deolasee <pavan.deola...@gmail.com> wrote: > > > TBH I still don't see why this does not provide the same guarantee that the > current code provides, but given the concerns expressed by others, I am not > gonna pursue beyond a point. But one last time :-) > > The current code uses xl_prev to cross-verify the record B, read after > record A, indeed follows A and has a valid back-link to A. This deals with > problems where B might actually be an old WAL record, carried over from a > stale WAL file. > > Now if we store xl_curr, we can definitely guarantee that B is ahead of A > because B->xl_curr will be greater than A->xl_curr (otherwise we bail out). > So that deals with the problem of stale WAL records. In addition, we also > know where A ends (we can deduce that even for XLOG_SWITCH records knowing > where the next record will start after the switch) and hence we know where B > should start. So we read at B and also confirm that B->xl_curr matches B's > position. If it does not, we declare end-of-WAL and bail out. So where is > the problem? >
This seems to have got a bit lost in subsequent discussion. >> >> > 2. Does the new logic in pg_rewind to search backward for a checkpoint >> > work reliably, and will it be slow? >> >> If you have to search backwards, this breaks it. Full stop. > > > We don't really need to fetch the previous record. We really need to find > the last checkpoint prior to a given LSN. That can be done by reading WAL > segments forward. It can be a little slow, but hopefully not a whole lot. > > A <- B <- C <- CHKPT <- D <- E <- F <- G > > So today, if we want to find last checkpoint prio to G, we go through the > back-links until we find the first checkpoint record. In the proposed code, > we read forward the current WAL segment, remember the last CHKPT record seen > and once we see G, we know we have found the prior checkpoint. If the > checkpoint does not exist in the current WAL, we read forward the previous > WAL and return the last checkpoint record in that WAL and so on. So in the > worst case, we might read a WAL segment extra before we find the checkpoint > record. That's not ideal but not too bad given that only pg_rewind needs > this and that too only once. > Some degree of slowdown in pg_rewind seems an acceptable price to pay as long as it doesn't introduce errors. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services