On 2020/06/30 9:14, Kyotaro Horiguchi wrote:
Opps! I misunderstood that.

At Mon, 29 Jun 2020 13:00:25 +0000, "higuchi.dais...@fujitsu.com" 
<higuchi.dais...@fujitsu.com> wrote in
Fujii-san, thank you for comments.

The cause of this problem is that the checkpointer's sleep time is calculated
>from both checkpoint_timeout and archive_timeout during normal running,
but calculated only from checkpoint_timeout during recovery. So Daisuke-san's
patch tries to change that so that it's calculated from both of them even
during recovery. No?

Yes, it's exactly so.

last_xlog_switch_time is not updated during recovery. So "elapsed_secs" can be
large and cur_timeout can be negative. Isn't this problematic?

Yes... My patch was missing this.

The patch also makes WaitLatch called with zero timeout, which causes
assertion failure.

How about using the original archive_timeout value for calculating cur_timeout 
during recovery?

                 if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
                 {
                         elapsed_secs = now - last_xlog_switch_time;
                         if (elapsed_secs >= XLogArchiveTimeout)
                                 continue;               /* no sleep for us ... 
*/
                         cur_timeout = Min(cur_timeout, XLogArchiveTimeout - 
elapsed_secs);
                 }
+               else if (XLogArchiveTimeout > 0)
+                       cur_timeout = Min(cur_timeout, XLogArchiveTimeout);

During recovery, accurate cur_timeout is not calculated because elapsed_secs is 
not used.

Yes, that's an idea. But I'm a bit concerned about that this change makes
checkpointer wake up more frequently than necessary during recovery.
Which may increase the idle power consumption of checkpointer during
recovery. Of course, this would not be so problematic in practice because
we can expect that archive_timeout is not so small. But it seems better to
avoid unncessary wake-ups if we can easily do that.

However, after recovery is complete, WAL archiving will start by the next 
archive_timeout is reached.
I felt it is enough to solve this problem.

That causes unwanted change of cur_timeout during recovery.

As another approach, what about waking the checkpointer up at the end of
recovery like we already do for walsenders?

We don't want change checkpoint interval during recovery, that means
we cannot cnosider archive_timeout at the fist checkpointer after
recovery ends. So I think that the suggestion from Fujii-san is the
direction.

+1
If this idea has some problems, we can revisit Daisuke-san's idea.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to