Re: Race condition in recovery?

Dilip Kumar Tue, 02 Mar 2021 01:45:40 -0800

On Sat, Jan 23, 2021 at 10:06 AM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> But I am afraid that whether this adjustment (setting based on
> receiveTLI) is done based on some analysis or part of some bug fix.  I
> will try to find the history of this and maybe based on that we can
> make a better decision.


I have done further analysis of this,  so this of initializing the
expectedTLEs from receiveTLI instead of recoveryTargetTLI is done in
below commit.

=====
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>  2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakan...@iki.fi>  2013-01-03 14:11:58

    Delay reading timeline history file until it's fetched from master.

    Streaming replication can fetch any missing timeline history files from the
    master, but recovery would read the timeline history file for the target
    timeline before reading the checkpoint record, and before walreceiver has
    had a chance to fetch it from the master. Delay reading it, and the sanity
    checks involving timeline history, until after reading the checkpoint
    record.

    There is at least one scenario where this makes a difference: if you take
    a base backup from a standby server right after a timeline switch, the
    WAL segment containing the initial checkpoint record will begin with an
    older timeline ID. Without the timeline history file, recovering that file
    will fail as the older timeline ID is not recognized to be an ancestor of
    the target timeline. If you try to recover from such a backup, using only
    streaming replication to fetch the WAL, this patch is required for that to
    work.
=====

I did not understand one point about this commit message, "Without the
timeline history file, recovering that file will fail as the older
timeline ID is not recognized to be an ancestor of the target
timeline. "  I mean in which case this will be true?

Now the problem is that we have initialized the expectTLEs based on
the older timeline history file instead of recoveryTargetTLI, which is
breaking the sanity of expectedTLEs.  So in below function
(rescanLatestTimeLine), if we find the newest TLI is same as
recoveryTargetTLI then we assume we don't need to do anything but the
problem is expectedTLEs is set to wrong target and we never update
unless again timeline changes.  So I think first we need to identify
what the above commit is trying to achieve and then can we do it in a
better way without breaking the sanity of expectedTLEs.

rescanLatestTimeLine(void)
{
....
newtarget = findNewestTimeLine(recoveryTargetTLI);
if (newtarget == recoveryTargetTLI)
{
/* No new timelines found */
return false;
}
...
newExpectedTLEs = readTimeLineHistory(newtarget);
...
expectedTLEs = newExpectedTLEs;
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Race condition in recovery?

Reply via email to