Re: [HACKERS] Redesigning checkpoint_segments

Heikki Linnakangas Fri, 23 Aug 2013 14:10:27 -0700

On 03.07.2013 21:28, Peter Eisentraut wrote:

On 6/6/13 4:09 PM, Heikki Linnakangas wrote:

Here's a patch implementing that. Docs not updated yet. I did not change
the way checkpoint_segments triggers checkpoints - that'll can be a
separate patch. This only decouples the segment preallocation behavior
from checkpoint_segments. With the patch, you can set
checkpoint_segments really high, without consuming that much disk space
all the time.


I don't understand what this patch, by itself, will accomplish in terms
of the originally stated goals of making checkpoint_segments easier to
tune, and controlling disk space used.  To some degree, it makes both of
these things worse, because you can no longer use checkpoint_segments to
control the disk space.  Instead, it is replaced by magic.


The patch addressed the third point in my first post:

A third point is that even if you have 10 GB of disk space reserved
for WAL, you don't want to actually consume all that 10 GB, if it's
not required to run the database smoothly. There are several reasons
for that: backups based on a filesystem-level snapshot are larger
than necessary, if there are a lot of preallocated WAL segments and
in a virtualized or shared system, there might be other VMs or
applications that could make use of the disk space. On the other
hand, you don't want to run out of disk space while writing WAL -
that can lead to a PANIC in the worst case.

What sort of behavior are you expecting to come out of this?  In testing,
I didn't see much of a difference.  Although I'd expect that this would
actually preallocate fewer segments than the old formula.

For example, if you set checkpoint_segments to 200, and you temporarilygenerate 100 segments of WAL during an initial data load, but the normalworkload generates only 20 segments between checkpoints. Without thepatch, you will permanently have about 120 segments in pg_xlog, createdby the spike. With the patch, the extra segments will be graduallyremoved after the data load, down to the level needed by the constantworkload. That would be about 50 segments, assuming the defaultcheckpoint_completion_target=0.5.

Here's a bigger patch, which does more. It is based on the ideas in thepost I started this thread with, with feedback incorporated from thelong discussion. With this patch, WAL disk space usage is controlled bytwo GUCs:


min_recycle_wal_size
checkpoint_wal_size

These GUCs act as soft minimum and maximum on overall WAL size. At eachcheckpoint, the checkpointer removes enough old WAL files to keeppg_xlog usage below checkpoint_wal_size, and recycles enough new WALfiles to reach min_recycle_wal_size. Between those limits, there is aself-tuning mechanism to recycle just enough WAL files to get to end ofthe next checkpoint without running out of preallocated WAL files. Toestimate how many files are needed for that, a moving average of howmuch WAL is generated between checkpoints is calculated. The movingaverage is updated with "fast-rise slow-decline" behavior, to cater forpeak rather than true average use to some extent.

As today, checkpoints are triggered based on time or WAL usage,whichever comes first. WAL-based checkpoints are triggered based on thegood old formula: CheckPointSegments = (checkpoint_max_wal_size / (2.0 +checkpoint_completion_target)) / 16MB. CheckPointSegments controls thatlike before, but it is now an internal variable derived fromcheckpoint_wal_size, not visible to users.

These settings are fairly intuitive for a DBA to tune. You begin byfiguring out how much disk space you can afford to spend on WAL, and setcheckpoint_wal_size to that (with some safety margin, of course). Thenyou set checkpoint_timeout based on how long you're willing to wait forrecovery to finish. Finally, if you have infrequent batch jobs that needa lot more WAL than the system otherwise needs, you can setmin_recycle_wal_size to keep enough WAL preallocated for the spikes.

You can also set min_recycle_wal_size = checkpoint_wal_size, which getsyou the same behavior as without the patch, except that it's moreintuitive to set it in terms of "MB of WAL space required", instead of"# of segments between checkpoints".

Does that make sense? I'd love to hear feedback on how people setting upproduction databases would like to tune these things. The reason for theauto-tuning between the min and max is to be able to set reasonabledefaults e.g for embedded systems that don't have a DBA to do tuning.Currently, it's very difficult to come up with a reasonable defaultvalue for checkpoint_segments which would work well for a wide range ofsystems. The PostgreSQL default of 3 is way way too low for mostsystems. On the other hand, if you set it to, say, 20, that's a lot ofwasted space for a small database that's not updated much. With thispatch, you can set "max_wal_size=1GB" and if the database ends upactually only needing 100 MB of WAL, it will only use that much and notwaste 900 MB for useless preallocated WAL files.

These GUCs are still soft limits. If the system is busy enough that thecheckpointer can't reach its target, it can exceed checkpoint_wal_size.Making it a hard limit is a much bigger task than I'm willing to tackleright now.


- Heikki

*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 1036,1042 **** include 'filename'
          usually require a corresponding increase in
          <varname>checkpoint_segments</varname>, in order to spread out the
          process of writing large quantities of new or changed data over a
!         longer period of time.
         </para>
  
         <para>
--- 1036,1042 ----
          usually require a corresponding increase in
          <varname>checkpoint_segments</varname>, in order to spread out the
          process of writing large quantities of new or changed data over a
!         longer period of time. FIXME: What should we suggest here now?
         </para>
  
         <para>
***************
*** 1958,1974 **** include 'filename'
       <title>Checkpoints</title>
  
      <variablelist>
!      <varlistentry id="guc-checkpoint-segments" xreflabel="checkpoint_segments">
!       <term><varname>checkpoint_segments</varname> (<type>integer</type>)</term>
        <indexterm>
!        <primary><varname>checkpoint_segments</> configuration parameter</primary>
        </indexterm>
        <listitem>
         <para>
!         Maximum number of log file segments between automatic WAL
!         checkpoints (each segment is normally 16 megabytes). The default
!         is three segments.  Increasing this parameter can increase the
!         amount of time needed for crash recovery.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
--- 1958,1977 ----
       <title>Checkpoints</title>
  
      <variablelist>
!      <varlistentry id="guc-checkpoint-wal-size" xreflabel="checkpoint_wal_size">
!       <term><varname>checkpoint_wal_size</varname> (<type>integer</type>)</term>
        <indexterm>
!        <primary><varname>checkpoint_wal_size</> configuration parameter</primary>
        </indexterm>
        <listitem>
         <para>
!         Maximum size to let the WAL grow to between automatic WAL
!         checkpoints. This is a soft limit; WAL size can exceed
!         <varname>checkpoint_wal_size</> under special circumstances, like
!         under heavy load, a failing <varname>archive_command</>, or a high
!         <varname>wal_keep_segments</> setting. The default is 256 MB.
!         Increasing this parameter can increase the amount of time needed for
!         crash recovery.
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
***************
*** 2028,2033 **** include 'filename'
--- 2031,2054 ----
        </listitem>
       </varlistentry>
  
+      <varlistentry id="guc-min-recycle-wal-size" xreflabel="min_recycle_wal_size">
+       <term><varname>min_recycle_wal_size</varname> (<type>integer</type>)</term>
+       <indexterm>
+        <primary><varname>min_recycle_wal_size</> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+        <para>
+         As long as WAL disk usage stays below this setting, old WAL files are
+         always recycled for future use at a checkpoint, rather than removed.
+         This can be used to ensure that enough WAL space is reserved to
+         handle spikes in WAL usage, for example when running large batch
+         jobs. The default is 80 MB.
+         This parameter can only be set in the <filename>postgresql.conf</>
+         file or on the server command line.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
       </variablelist>
       </sect2>
       <sect2 id="runtime-config-wal-archiving">
*** a/doc/src/sgml/perform.sgml
--- b/doc/src/sgml/perform.sgml
***************
*** 1302,1320 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     </para>
    </sect2>
  
!   <sect2 id="populate-checkpoint-segments">
!    <title>Increase <varname>checkpoint_segments</varname></title>
  
     <para>
!     Temporarily increasing the <xref
!     linkend="guc-checkpoint-segments"> configuration variable can also
      make large data loads faster.  This is because loading a large
      amount of data into <productname>PostgreSQL</productname> will
      cause checkpoints to occur more often than the normal checkpoint
      frequency (specified by the <varname>checkpoint_timeout</varname>
      configuration variable). Whenever a checkpoint occurs, all dirty
      pages must be flushed to disk. By increasing
!     <varname>checkpoint_segments</varname> temporarily during bulk
      data loads, the number of checkpoints that are required can be
      reduced.
     </para>
--- 1302,1320 ----
     </para>
    </sect2>
  
!   <sect2 id="populate-checkpoint-wal-size">
!    <title>Increase <varname>checkpoint_wal_size</varname></title>
  
     <para>
!     Increasing the <xref
!     linkend="guc-checkpoint-wal-size"> configuration variable can also
      make large data loads faster.  This is because loading a large
      amount of data into <productname>PostgreSQL</productname> will
      cause checkpoints to occur more often than the normal checkpoint
      frequency (specified by the <varname>checkpoint_timeout</varname>
      configuration variable). Whenever a checkpoint occurs, all dirty
      pages must be flushed to disk. By increasing
!     <varname>checkpoint-wal-size</varname> temporarily during bulk
      data loads, the number of checkpoints that are required can be
      reduced.
     </para>
***************
*** 1419,1425 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
        <para>
         Set appropriate (i.e., larger than normal) values for
         <varname>maintenance_work_mem</varname> and
!        <varname>checkpoint_segments</varname>.
        </para>
       </listitem>
       <listitem>
--- 1419,1425 ----
        <para>
         Set appropriate (i.e., larger than normal) values for
         <varname>maintenance_work_mem</varname> and
!        <varname>checkpoint_wal_size</varname>.
        </para>
       </listitem>
       <listitem>
***************
*** 1486,1492 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
  
      So when loading a data-only dump, it is up to you to drop and recreate
      indexes and foreign keys if you wish to use those techniques.
!     It's still useful to increase <varname>checkpoint_segments</varname>
      while loading the data, but don't bother increasing
      <varname>maintenance_work_mem</varname>; rather, you'd do that while
      manually recreating indexes and foreign keys afterwards.
--- 1486,1492 ----
  
      So when loading a data-only dump, it is up to you to drop and recreate
      indexes and foreign keys if you wish to use those techniques.
!     It's still useful to increase <varname>checkpoint_wal_size</varname>
      while loading the data, but don't bother increasing
      <varname>maintenance_work_mem</varname>; rather, you'd do that while
      manually recreating indexes and foreign keys afterwards.
***************
*** 1542,1548 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
  
       <listitem>
        <para>
!        Increase <xref linkend="guc-checkpoint-segments"> and <xref
         linkend="guc-checkpoint-timeout"> ; this reduces the frequency
         of checkpoints, but increases the storage requirements of
         <filename>/pg_xlog</>.
--- 1542,1548 ----
  
       <listitem>
        <para>
!        Increase <xref linkend="guc-checkpoint-wal-size"> and <xref
         linkend="guc-checkpoint-timeout"> ; this reduces the frequency
         of checkpoints, but increases the storage requirements of
         <filename>/pg_xlog</>.
*** a/doc/src/sgml/wal.sgml
--- b/doc/src/sgml/wal.sgml
***************
*** 471,479 ****
    <para>
     The server's checkpointer process automatically performs
     a checkpoint every so often.  A checkpoint is begun every <xref
!    linkend="guc-checkpoint-segments"> log segments, or every <xref
!    linkend="guc-checkpoint-timeout"> seconds, whichever comes first.
!    The default settings are 3 segments and 300 seconds (5 minutes), respectively.
     If no WAL has been written since the previous checkpoint, new checkpoints
     will be skipped even if <varname>checkpoint_timeout</> has passed.
     (If WAL archiving is being used and you want to put a lower limit on how
--- 471,480 ----
    <para>
     The server's checkpointer process automatically performs
     a checkpoint every so often.  A checkpoint is begun every <xref
!    linkend="guc-checkpoint-timeout"> seconds, or if
!    <xref linkend="guc-checkpoint-wal-size"> is about to be exceeded, whichever
!    comes first.
!    The default settings are 5 minutes and 256 MB, respectively.
     If no WAL has been written since the previous checkpoint, new checkpoints
     will be skipped even if <varname>checkpoint_timeout</> has passed.
     (If WAL archiving is being used and you want to put a lower limit on how
***************
*** 485,492 ****
    </para>
  
    <para>
!    Reducing <varname>checkpoint_segments</varname> and/or
!    <varname>checkpoint_timeout</varname> causes checkpoints to occur
     more often. This allows faster after-crash recovery, since less work
     will need to be redone. However, one must balance this against the
     increased cost of flushing dirty data pages more often. If
--- 486,493 ----
    </para>
  
    <para>
!    Reducing <varname>checkpoint_timeout</varname> and/or
!    <varname>checkpoint_wal_size</varname> causes checkpoints to occur
     more often. This allows faster after-crash recovery, since less work
     will need to be redone. However, one must balance this against the
     increased cost of flushing dirty data pages more often. If
***************
*** 509,519 ****
     parameter.  If checkpoints happen closer together than
     <varname>checkpoint_warning</> seconds,
     a message will be output to the server log recommending increasing
!    <varname>checkpoint_segments</varname>.  Occasional appearance of such
     a message is not cause for alarm, but if it appears often then the
     checkpoint control parameters should be increased. Bulk operations such
     as large <command>COPY</> transfers might cause a number of such warnings
!    to appear if you have not set <varname>checkpoint_segments</> high
     enough.
    </para>
  
--- 510,520 ----
     parameter.  If checkpoints happen closer together than
     <varname>checkpoint_warning</> seconds,
     a message will be output to the server log recommending increasing
!    <varname>checkpoint_wal_size</varname>.  Occasional appearance of such
     a message is not cause for alarm, but if it appears often then the
     checkpoint control parameters should be increased. Bulk operations such
     as large <command>COPY</> transfers might cause a number of such warnings
!    to appear if you have not set <varname>checkpoint_wal_size</> high
     enough.
    </para>
  
***************
*** 524,533 ****
     <xref linkend="guc-checkpoint-completion-target">, which is
     given as a fraction of the checkpoint interval.
     The I/O rate is adjusted so that the checkpoint finishes when the
!    given fraction of <varname>checkpoint_segments</varname> WAL segments
!    have been consumed since checkpoint start, or the given fraction of
!    <varname>checkpoint_timeout</varname> seconds have elapsed,
!    whichever is sooner.  With the default value of 0.5,
     <productname>PostgreSQL</> can be expected to complete each checkpoint
     in about half the time before the next checkpoint starts.  On a system
     that's very close to maximum I/O throughput during normal operation,
--- 525,534 ----
     <xref linkend="guc-checkpoint-completion-target">, which is
     given as a fraction of the checkpoint interval.
     The I/O rate is adjusted so that the checkpoint finishes when the
!    given fraction of
!    <varname>checkpoint_timeout</varname> seconds have elapsed, or before
!    <varname>checkpoint_wal_size</varname> is exceeded, whichever is sooner.
!    With the default value of 0.5,
     <productname>PostgreSQL</> can be expected to complete each checkpoint
     in about half the time before the next checkpoint starts.  On a system
     that's very close to maximum I/O throughput during normal operation,
***************
*** 544,561 ****
    </para>
  
    <para>
!    There will always be at least one WAL segment file, and will normally
!    not be more than (2 + <varname>checkpoint_completion_target</varname>) * <varname>checkpoint_segments</varname> + 1
!    or <varname>checkpoint_segments</> + <xref linkend="guc-wal-keep-segments"> + 1
!    files.  Each segment file is normally 16 MB (though this size can be
!    altered when building the server).  You can use this to estimate space
!    requirements for <acronym>WAL</acronym>.
!    Ordinarily, when old log segment files are no longer needed, they
!    are recycled (that is, renamed to become future segments in the numbered
!    sequence). If, due to a short-term peak of log output rate, there
!    are more than 3 * <varname>checkpoint_segments</varname> + 1
!    segment files, the unneeded segment files will be deleted instead
!    of recycled until the system gets back under this limit.
    </para>
  
    <para>
--- 545,577 ----
    </para>
  
    <para>
!    The number of WAL segment files in <filename>pg_xlog</> directory depends on
!    <varname>checkpoint_wal_size</>, <varname>wal_recycle_min_size</> and the
!    amount of WAL generated in previous checkpoint cycles. When old log
!    segment files are no longer needed, they are removed or recycled (that is,
!    renamed to become future segments in the numbered sequence). If, due to a
!    short-term peak of log output rate, <varname>checkpoint_wal_size</> is
!    exceeded, the unneeded segment files will be removed until the system
!    gets back under this limit. Below that limit, the system recycles enough
!    WAL files to cover the estimated need until the next checkpoint, and
!    removes the rest. The estimate is based on a moving average of the number
!    of WAL files used in previous checkpoint cycles. The moving average
!    is increased immediately if the actual usage exceeds the estimate, so it
!    accommodates peak usage rather average usage to some extent.
!    <varname>wal_recycle_min_size</> puts a minimum on the amount of WAL files
!    recycled for future usage; that much WAL is always recycled for future use,
!    even if the system is idle and the WAL usage estimate suggests that little
!    WAL is needed.
!   </para>
! 
!   <para>
!    Independently of <varname>checkpoint_wal_size</varname>,
!    <xref linkend="guc-wal-keep-segments"> + 1 most recent WAL files are
!    kept at all times. Also, if WAL archiving is used, old segments can not be
!    removed or recycled until they are archived. If WAL archiving cannot keep up
!    with the pace that WAL is generated, or if <varname>archive_command</varname>
!    fails repeatedly, old WAL files will accumulate in <filename>pg_xlog</>
!    until the situation is resolved.
    </para>
  
    <para>
***************
*** 570,578 ****
     master because restartpoints can only be performed at checkpoint records.
     A restartpoint is triggered when a checkpoint record is reached if at
     least <varname>checkpoint_timeout</> seconds have passed since the last
!    restartpoint. In standby mode, a restartpoint is also triggered if at
!    least <varname>checkpoint_segments</> log segments have been replayed
!    since the last restartpoint.
    </para>
  
    <para>
--- 586,593 ----
     master because restartpoints can only be performed at checkpoint records.
     A restartpoint is triggered when a checkpoint record is reached if at
     least <varname>checkpoint_timeout</> seconds have passed since the last
!    restartpoint, or if WAL size is about to exceed
!    <varname>checkpoint_wal_size</>.
    </para>
  
    <para>
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 71,77 **** extern uint32 bootstrap_data_checksum_version;
  
  
  /* User-settable parameters */
! int			CheckPointSegments = 3;
  int			wal_keep_segments = 0;
  int			XLOGbuffers = -1;
  int			XLogArchiveTimeout = 0;
--- 71,78 ----
  
  
  /* User-settable parameters */
! int			checkpoint_wal_size = 262144;	/* 256 MB */
! int			min_recycle_wal_size = 81920;	/* 80 MB */
  int			wal_keep_segments = 0;
  int			XLOGbuffers = -1;
  int			XLogArchiveTimeout = 0;
***************
*** 86,108 **** int			CommitDelay = 0;	/* precommit delay in microseconds */
  int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
  int			num_xloginsert_slots = 8;
  
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
  #endif
  
! /*
!  * XLOGfileslop is the maximum number of preallocated future XLOG segments.
!  * When we are done with an old XLOG segment file, we will recycle it as a
!  * future XLOG segment as long as there aren't already XLOGfileslop future
!  * segments; else we'll delete it.  This could be made a separate GUC
!  * variable, but at present I think it's sufficient to hardwire it as
!  * 2*CheckPointSegments+1.	Under normal conditions, a checkpoint will free
!  * no more than 2*CheckPointSegments log segments, and we want to recycle all
!  * of them; the +1 allows boundary cases to happen without wasting a
!  * delete/create-segment cycle.
!  */
! #define XLOGfileslop	(2*CheckPointSegments + 1)
! 
  
  /*
   * GUC support
--- 87,105 ----
  int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
  int			num_xloginsert_slots = 8;
  
+ /*
+  * Max distance from last checkpoint, before triggering a new xlog-based
+  * checkpoint.
+  */
+ int			CheckPointSegments;
+ 
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
  #endif
  
! /* Estimated distance between checkpoints, in bytes */
! static double CheckPointDistanceEstimate = 0;
! static double PrevCheckPointDistance = 0;
  
  /*
   * GUC support
***************
*** 740,746 **** static void AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic);
  static bool XLogCheckpointNeeded(XLogSegNo new_segno);
  static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
  static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
! 					   bool find_free, int *max_advance,
  					   bool use_lock);
  static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
  			 int source, bool notexistOk);
--- 737,743 ----
  static bool XLogCheckpointNeeded(XLogSegNo new_segno);
  static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
  static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
! 					   bool find_free, XLogSegNo max_segno,
  					   bool use_lock);
  static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
  			 int source, bool notexistOk);
***************
*** 753,759 **** static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
  static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
  static void XLogFileClose(void);
  static void PreallocXlogFiles(XLogRecPtr endptr);
! static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr);
  static void UpdateLastRemovedPtr(char *filename);
  static void ValidateXLOGDirectoryStructure(void);
  static void CleanupBackupHistory(void);
--- 750,756 ----
  static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
  static void XLogFileClose(void);
  static void PreallocXlogFiles(XLogRecPtr endptr);
! static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr);
  static void UpdateLastRemovedPtr(char *filename);
  static void ValidateXLOGDirectoryStructure(void);
  static void CleanupBackupHistory(void);
***************
*** 2548,2553 **** AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
--- 2545,2653 ----
  }
  
  /*
+  * Calculate CheckPointSegments based on checkpoint_wal_size and
+  * checkpoint_completion_target.
+  */
+ static void
+ CalculateCheckpointSegments(void)
+ {
+ 	double		target;
+ 
+ 	/*-------
+ 	 * Calculate the distance at which to trigger a checkpoint, to avoid
+ 	 * exceeding checkpoint_wal_size. This is based on two assumptions:
+ 	 *
+ 	 * a) we keep WAL for two checkpoint cycles, back to the "prev" checkpoint.
+ 	 * b) during checkpoint, we consume checkpoint_completion_target *
+ 	 *    number of segments consumed between checkpoints.
+ 	 *-------
+ 	 */
+ 	target = (double ) checkpoint_wal_size / (double) (XLOG_SEG_SIZE / 1024);
+ 	target = target / (2.0 + CheckPointCompletionTarget);
+ 
+ 	/* round down */
+ 	CheckPointSegments = (int) target;
+ 
+ 	if (CheckPointSegments < 1)
+ 		CheckPointSegments = 1;
+ }
+ 
+ void
+ assign_checkpoint_wal_size(int newval, void *extra)
+ {
+ 	checkpoint_wal_size = newval;
+ 	CalculateCheckpointSegments();
+ }
+ 
+ void
+ assign_checkpoint_completion_target(double newval, void *extra)
+ {
+ 	CheckPointCompletionTarget = newval;
+ 	CalculateCheckpointSegments();
+ }
+ 
+ /*
+  * At a checkpoint, how many WAL segments to recycle as preallocated future
+  * XLOG segments? Returns the highest segment that should be preallocated.
+  */
+ static XLogSegNo
+ XLOGfileslop(XLogRecPtr PriorRedoPtr)
+ {
+ 	double		nsegments;
+ 	XLogSegNo	minSegNo;
+ 	XLogSegNo	maxSegNo;
+ 	double		distance;
+ 	XLogSegNo	recycleSegNo;
+ 
+ 	/*
+ 	 * Calculate the segment numbers that min_recycle_wal_size and
+ 	 * checkpoint_wal_size correspond to. Always recycle enough segments
+ 	 * to meet the minimum, and remove enough segments to stay below the
+ 	 * maximum.
+ 	 */
+ 	nsegments = (double) min_recycle_wal_size / (double) (XLOG_SEG_SIZE / 1024);
+ 	minSegNo = PriorRedoPtr / XLOG_SEG_SIZE + (int) nsegments;
+ 	nsegments = (double) checkpoint_wal_size / (double) (XLOG_SEG_SIZE / 1024);
+ 	maxSegNo =  PriorRedoPtr / XLOG_SEG_SIZE + (int) nsegments;
+ 
+ 	/*
+ 	 * Between those limits, recycle enough segments to get us through to the
+ 	 * estimated end of next checkpoint.
+ 	 *
+ 	 * To estimate where the next checkpoint will finish, assume that the
+ 	 * system runs steadily consuming CheckPointDistanceEstimate
+ 	 * bytes between every checkpoint.
+ 	 *
+ 	 * The reason this calculation is done from the prior checkpoint, not the
+ 	 * one that just finished, is that this behaves better if some checkpoint
+ 	 * cycles are abnormally short, like if you perform a manual checkpoint
+ 	 * right after a timed one. The manual checkpoint will make almost a full
+ 	 * cycle's worth of WAL segments available for recycling, because the
+ 	 * segments from the prior's prior, fully-sized checkpoint cycle are no
+ 	 * longer needed. However, the next checkpoint will make only few segments
+ 	 * available for recycling, the ones generated between the timed
+ 	 * checkpoint and the manual one right after that. If at the manual
+ 	 * checkpoint we only retained enough segments to get us to the next timed
+ 	 * one, and removed the rest, then at the next checkpoint we would not
+ 	 * have enough segments around for recycling, to get us to the checkpoint
+ 	 * after that. Basing the calculations on the distance from the prior redo
+ 	 * pointer largely fixes that problem.
+ 	 */
+ 	distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
+ 	/* add 10% for good measure. */
+ 	distance *= 1.10;
+ 
+ 	recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) / XLOG_SEG_SIZE);
+ 
+ 	if (recycleSegNo < minSegNo)
+ 		recycleSegNo = minSegNo;
+ 	if (recycleSegNo > maxSegNo)
+ 		recycleSegNo = maxSegNo;
+ 
+ 	return recycleSegNo;
+ }
+ 
+ /*
   * Check whether we've consumed enough xlog space that a checkpoint is needed.
   *
   * new_segno indicates a log file that has just been filled up (or read
***************
*** 3345,3351 **** XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
  	char		path[MAXPGPATH];
  	char		tmppath[MAXPGPATH];
  	XLogSegNo	installed_segno;
! 	int			max_advance;
  	int			fd;
  	bool		zero_fill = true;
  
--- 3445,3451 ----
  	char		path[MAXPGPATH];
  	char		tmppath[MAXPGPATH];
  	XLogSegNo	installed_segno;
! 	XLogSegNo	max_segno;
  	int			fd;
  	bool		zero_fill = true;
  
***************
*** 3472,3480 **** XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
  	 * pre-create a future log segment.
  	 */
  	installed_segno = logsegno;
! 	max_advance = XLOGfileslop;
  	if (!InstallXLogFileSegment(&installed_segno, tmppath,
! 								*use_existent, &max_advance,
  								use_lock))
  	{
  		/*
--- 3572,3590 ----
  	 * pre-create a future log segment.
  	 */
  	installed_segno = logsegno;
! 
! 	/*
! 	 * XXX: What should we use as max_segno? We used to use XLOGfileslop when
! 	 * that was a constant, but that was always a bit dubious: normally, at a
! 	 * checkpoint, XLOGfileslop was the offset from the checkpoint record,
! 	 * but here, it was the offset from the insert location. We can't do the
! 	 * normal XLOGfileslop calculation here because we don't have access to
! 	 * the prior checkpoint's redo location. So somewhat arbitrarily, just
! 	 * use CheckPointSegments.
! 	 */
! 	max_segno = logsegno + CheckPointSegments;
  	if (!InstallXLogFileSegment(&installed_segno, tmppath,
! 								*use_existent, max_segno,
  								use_lock))
  	{
  		/*
***************
*** 3597,3603 **** XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno)
  	/*
  	 * Now move the segment into place with its final name.
  	 */
! 	if (!InstallXLogFileSegment(&destsegno, tmppath, false, NULL, false))
  		elog(ERROR, "InstallXLogFileSegment should not have failed");
  }
  
--- 3707,3713 ----
  	/*
  	 * Now move the segment into place with its final name.
  	 */
! 	if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false))
  		elog(ERROR, "InstallXLogFileSegment should not have failed");
  }
  
***************
*** 3617,3638 **** XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno)
   * number at or after the passed numbers.  If FALSE, install the new segment
   * exactly where specified, deleting any existing segment file there.
   *
!  * *max_advance: maximum number of segno slots to advance past the starting
!  * point.  Fail if no free slot is found in this range.  On return, reduced
!  * by the number of slots skipped over.  (Irrelevant, and may be NULL,
!  * when find_free is FALSE.)
   *
   * use_lock: if TRUE, acquire ControlFileLock while moving file into
   * place.  This should be TRUE except during bootstrap log creation.  The
   * caller must *not* hold the lock at call.
   *
   * Returns TRUE if the file was installed successfully.  FALSE indicates that
!  * max_advance limit was exceeded, or an error occurred while renaming the
   * file into place.
   */
  static bool
  InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
! 					   bool find_free, int *max_advance,
  					   bool use_lock)
  {
  	char		path[MAXPGPATH];
--- 3727,3747 ----
   * number at or after the passed numbers.  If FALSE, install the new segment
   * exactly where specified, deleting any existing segment file there.
   *
!  * max_segno: maximum segment number to install the new file as.  Fail if no
!  * free slot is found between *segno and max_segno. (Ignored when find_free
!  * is FALSE.)
   *
   * use_lock: if TRUE, acquire ControlFileLock while moving file into
   * place.  This should be TRUE except during bootstrap log creation.  The
   * caller must *not* hold the lock at call.
   *
   * Returns TRUE if the file was installed successfully.  FALSE indicates that
!  * max_segno limit was exceeded, or an error occurred while renaming the
   * file into place.
   */
  static bool
  InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
! 					   bool find_free, XLogSegNo max_segno,
  					   bool use_lock)
  {
  	char		path[MAXPGPATH];
***************
*** 3656,3662 **** InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
  		/* Find a free slot to put it in */
  		while (stat(path, &stat_buf) == 0)
  		{
! 			if (*max_advance <= 0)
  			{
  				/* Failed to find a free slot within specified range */
  				if (use_lock)
--- 3765,3771 ----
  		/* Find a free slot to put it in */
  		while (stat(path, &stat_buf) == 0)
  		{
! 			if ((*segno) >= max_segno)
  			{
  				/* Failed to find a free slot within specified range */
  				if (use_lock)
***************
*** 3664,3670 **** InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
  				return false;
  			}
  			(*segno)++;
- 			(*max_advance)--;
  			XLogFilePath(path, ThisTimeLineID, *segno);
  		}
  	}
--- 3773,3778 ----
***************
*** 3997,4010 **** UpdateLastRemovedPtr(char *filename)
  /*
   * Recycle or remove all log files older or equal to passed segno
   *
!  * endptr is current (or recent) end of xlog; this is used to determine
   * whether we want to recycle rather than delete no-longer-wanted log files.
   */
  static void
! RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
  {
  	XLogSegNo	endlogSegNo;
! 	int			max_advance;
  	DIR		   *xldir;
  	struct dirent *xlde;
  	char		lastoff[MAXFNAMELEN];
--- 4105,4119 ----
  /*
   * Recycle or remove all log files older or equal to passed segno
   *
!  * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
!  * redo pointer of the previous checkpoint. These are used to determine
   * whether we want to recycle rather than delete no-longer-wanted log files.
   */
  static void
! RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
  {
  	XLogSegNo	endlogSegNo;
! 	XLogSegNo	recycleSegNo;
  	DIR		   *xldir;
  	struct dirent *xlde;
  	char		lastoff[MAXFNAMELEN];
***************
*** 4016,4026 **** RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
  	struct stat statbuf;
  
  	/*
! 	 * Initialize info about where to try to recycle to.  We allow recycling
! 	 * segments up to XLOGfileslop segments beyond the current XLOG location.
  	 */
  	XLByteToPrevSeg(endptr, endlogSegNo);
! 	max_advance = XLOGfileslop;
  
  	xldir = AllocateDir(XLOGDIR);
  	if (xldir == NULL)
--- 4125,4134 ----
  	struct stat statbuf;
  
  	/*
! 	 * Initialize info about where to try to recycle to.
  	 */
  	XLByteToPrevSeg(endptr, endlogSegNo);
! 	recycleSegNo = XLOGfileslop(PriorRedoPtr);
  
  	xldir = AllocateDir(XLOGDIR);
  	if (xldir == NULL)
***************
*** 4069,4088 **** RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
  				 * for example can create symbolic links pointing to a
  				 * separate archive directory.
  				 */
! 				if (lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
  					InstallXLogFileSegment(&endlogSegNo, path,
! 										   true, &max_advance, true))
  				{
  					ereport(DEBUG2,
  							(errmsg("recycled transaction log file \"%s\"",
  									xlde->d_name)));
  					CheckpointStats.ckpt_segs_recycled++;
  					/* Needn't recheck that slot on future iterations */
! 					if (max_advance > 0)
! 					{
! 						endlogSegNo++;
! 						max_advance--;
! 					}
  				}
  				else
  				{
--- 4177,4193 ----
  				 * for example can create symbolic links pointing to a
  				 * separate archive directory.
  				 */
! 				if (endlogSegNo <= recycleSegNo &&
! 					lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
  					InstallXLogFileSegment(&endlogSegNo, path,
! 										   true, recycleSegNo, true))
  				{
  					ereport(DEBUG2,
  							(errmsg("recycled transaction log file \"%s\"",
  									xlde->d_name)));
  					CheckpointStats.ckpt_segs_recycled++;
  					/* Needn't recheck that slot on future iterations */
! 					endlogSegNo++;
  				}
  				else
  				{
***************
*** 7863,7869 **** LogCheckpointEnd(bool restartpoint)
  		elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
  			 "%d transaction log file(s) added, %d removed, %d recycled; "
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
! 			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
  			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
--- 7968,7975 ----
  		elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
  			 "%d transaction log file(s) added, %d removed, %d recycled; "
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
! 			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
! 			 "distance=%d KB, estimate=%d KB",
  			 CheckpointStats.ckpt_bufs_written,
  			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
***************
*** 7874,7885 **** LogCheckpointEnd(bool restartpoint)
  			 total_secs, total_usecs / 1000,
  			 CheckpointStats.ckpt_sync_rels,
  			 longest_secs, longest_usecs / 1000,
! 			 average_secs, average_usecs / 1000);
  	else
  		elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
  			 "%d transaction log file(s) added, %d removed, %d recycled; "
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
! 			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
  			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
--- 7980,7994 ----
  			 total_secs, total_usecs / 1000,
  			 CheckpointStats.ckpt_sync_rels,
  			 longest_secs, longest_usecs / 1000,
! 			 average_secs, average_usecs / 1000,
! 			 (int) (PrevCheckPointDistance / 1024.0),
! 			 (int) (CheckPointDistanceEstimate / 1024.0));
  	else
  		elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
  			 "%d transaction log file(s) added, %d removed, %d recycled; "
  			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
! 			 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
! 			 "distance=%d KB, estimate=%d KB",
  			 CheckpointStats.ckpt_bufs_written,
  			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
***************
*** 7890,7896 **** LogCheckpointEnd(bool restartpoint)
  			 total_secs, total_usecs / 1000,
  			 CheckpointStats.ckpt_sync_rels,
  			 longest_secs, longest_usecs / 1000,
! 			 average_secs, average_usecs / 1000);
  }
  
  /*
--- 7999,8046 ----
  			 total_secs, total_usecs / 1000,
  			 CheckpointStats.ckpt_sync_rels,
  			 longest_secs, longest_usecs / 1000,
! 			 average_secs, average_usecs / 1000,
! 			 (int) (PrevCheckPointDistance / 1024.0),
! 			 (int) (CheckPointDistanceEstimate / 1024.0));
! }
! 
! /*
!  * Update the estimate of distance between checkpoints.
!  *
!  * The estimate is used to calculate the number of WAL segments to keep
!  * preallocated, see XLOGFileSlop().
!  */
! static void
! UpdateCheckPointDistanceEstimate(uint64 nbytes)
! {
! 	/*
! 	 * To estimate the number of segments consumed between checkpoints, keep
! 	 * a moving average of the actual number of segments consumed in previous
! 	 * checkpoint cycles. However, if the load is bursty, with quiet periods
! 	 * and busy periods, we want to cater for the peak load. So instead of a
! 	 * plain moving average, let the average decline slowly if the previous
! 	 * cycle used less WAL than estimated, but bump it up immediately if it
! 	 * used more.
! 	 *
! 	 * When checkpoints are triggered by checkpoint_wal_size, this should
! 	 * converge to CheckpointSegments * XLOG_SEG_SIZE,
! 	 *
! 	 * Note: This doesn't pay any attention to what caused the checkpoint.
! 	 * Checkpoints triggered manually with CHECKPOINT command, or by e.g
! 	 * starting a base backup, are counted the same as those created
! 	 * automatically. The slow-decline will largely mask them out, if they are
! 	 * not frequent. If they are frequent, it seems reasonable to count them
! 	 * in as any others; if you issue a manual checkpoint every 5 minutes and
! 	 * never let a timed checkpoint happen, it makes sense to base the
! 	 * preallocation on that 5 minute interval rather than whatever
! 	 * checkpoint_timeout is set to.
! 	 */
! 	PrevCheckPointDistance = nbytes;
! 	if (CheckPointDistanceEstimate < nbytes)
! 		CheckPointDistanceEstimate = nbytes;
! 	else
! 		CheckPointDistanceEstimate =
! 			(0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
  }
  
  /*
***************
*** 7932,7938 **** CreateCheckPoint(int flags)
  	XLogCtlInsert *Insert = &XLogCtl->Insert;
  	XLogRecData rdata;
  	uint32		freespace;
! 	XLogSegNo	_logSegNo;
  	XLogRecPtr	curInsert;
  	VirtualTransactionId *vxids;
  	int			nvxids;
--- 8082,8088 ----
  	XLogCtlInsert *Insert = &XLogCtl->Insert;
  	XLogRecData rdata;
  	uint32		freespace;
! 	XLogRecPtr	PriorRedoPtr;
  	XLogRecPtr	curInsert;
  	VirtualTransactionId *vxids;
  	int			nvxids;
***************
*** 8237,8246 **** CreateCheckPoint(int flags)
  				(errmsg("concurrent transaction log activity while database system is shutting down")));
  
  	/*
! 	 * Select point at which we can truncate the log, which we base on the
! 	 * prior checkpoint's earliest info.
  	 */
! 	XLByteToSeg(ControlFile->checkPointCopy.redo, _logSegNo);
  
  	/*
  	 * Update the control file.
--- 8387,8396 ----
  				(errmsg("concurrent transaction log activity while database system is shutting down")));
  
  	/*
! 	 * Remember the prior checkpoint's redo pointer, used later to determine
! 	 * the point where the log can be truncated.
  	 */
! 	PriorRedoPtr = ControlFile->checkPointCopy.redo;
  
  	/*
  	 * Update the control file.
***************
*** 8294,8304 **** CreateCheckPoint(int flags)
  	 * Delete old log files (those no longer needed even for previous
  	 * checkpoint or the standbys in XLOG streaming).
  	 */
! 	if (_logSegNo)
  	{
  		KeepLogSeg(recptr, &_logSegNo);
  		_logSegNo--;
! 		RemoveOldXlogFiles(_logSegNo, recptr);
  	}
  
  	/*
--- 8444,8460 ----
  	 * Delete old log files (those no longer needed even for previous
  	 * checkpoint or the standbys in XLOG streaming).
  	 */
! 	if (PriorRedoPtr != InvalidXLogRecPtr)
  	{
+ 		XLogSegNo	_logSegNo;
+ 
+ 		/* Update the average distance between checkpoints. */
+ 		UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
+ 
+ 		XLByteToSeg(PriorRedoPtr, _logSegNo);
  		KeepLogSeg(recptr, &_logSegNo);
  		_logSegNo--;
! 		RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr);
  	}
  
  	/*
***************
*** 8486,8492 **** CreateRestartPoint(int flags)
  {
  	XLogRecPtr	lastCheckPointRecPtr;
  	CheckPoint	lastCheckPoint;
! 	XLogSegNo	_logSegNo;
  	TimestampTz xtime;
  
  	/* use volatile pointer to prevent code rearrangement */
--- 8642,8648 ----
  {
  	XLogRecPtr	lastCheckPointRecPtr;
  	CheckPoint	lastCheckPoint;
! 	XLogRecPtr	PriorRedoPtr;
  	TimestampTz xtime;
  
  	/* use volatile pointer to prevent code rearrangement */
***************
*** 8554,8560 **** CreateRestartPoint(int flags)
  	/*
  	 * Update the shared RedoRecPtr so that the startup process can calculate
  	 * the number of segments replayed since last restartpoint, and request a
! 	 * restartpoint if it exceeds checkpoint_segments.
  	 *
  	 * Like in CreateCheckPoint(), hold off insertions to update it, although
  	 * during recovery this is just pro forma, because no WAL insertions are
--- 8710,8716 ----
  	/*
  	 * Update the shared RedoRecPtr so that the startup process can calculate
  	 * the number of segments replayed since last restartpoint, and request a
! 	 * restartpoint if it exceeds CheckPointSegments.
  	 *
  	 * Like in CreateCheckPoint(), hold off insertions to update it, although
  	 * during recovery this is just pro forma, because no WAL insertions are
***************
*** 8585,8594 **** CreateRestartPoint(int flags)
  	CheckPointGuts(lastCheckPoint.redo, flags);
  
  	/*
! 	 * Select point at which we can truncate the xlog, which we base on the
! 	 * prior checkpoint's earliest info.
  	 */
! 	XLByteToSeg(ControlFile->checkPointCopy.redo, _logSegNo);
  
  	/*
  	 * Update pg_control, using current time.  Check that it still shows
--- 8741,8750 ----
  	CheckPointGuts(lastCheckPoint.redo, flags);
  
  	/*
! 	 * Remember the prior checkpoint's redo pointer, used later to determine
! 	 * the point at which we can truncate the log.
  	 */
! 	PriorRedoPtr = ControlFile->checkPointCopy.redo;
  
  	/*
  	 * Update pg_control, using current time.  Check that it still shows
***************
*** 8615,8626 **** CreateRestartPoint(int flags)
  	 * checkpoint/restartpoint) to prevent the disk holding the xlog from
  	 * growing full.
  	 */
! 	if (_logSegNo)
  	{
  		XLogRecPtr	receivePtr;
  		XLogRecPtr	replayPtr;
  		TimeLineID	replayTLI;
  		XLogRecPtr	endptr;
  
  		/*
  		 * Get the current end of xlog replayed or received, whichever is
--- 8771,8785 ----
  	 * checkpoint/restartpoint) to prevent the disk holding the xlog from
  	 * growing full.
  	 */
! 	if (PriorRedoPtr != InvalidXLogRecPtr)
  	{
  		XLogRecPtr	receivePtr;
  		XLogRecPtr	replayPtr;
  		TimeLineID	replayTLI;
  		XLogRecPtr	endptr;
+ 		XLogSegNo	_logSegNo;
+ 
+ 		XLByteToSeg(PriorRedoPtr, _logSegNo);
  
  		/*
  		 * Get the current end of xlog replayed or received, whichever is
***************
*** 8649,8655 **** CreateRestartPoint(int flags)
  		if (RecoveryInProgress())
  			ThisTimeLineID = replayTLI;
  
! 		RemoveOldXlogFiles(_logSegNo, endptr);
  
  		/*
  		 * Make more log segments if needed.  (Do this after recycling old log
--- 8808,8814 ----
  		if (RecoveryInProgress())
  			ThisTimeLineID = replayTLI;
  
! 		RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr);
  
  		/*
  		 * Make more log segments if needed.  (Do this after recycling old log
*** a/src/backend/postmaster/checkpointer.c
--- b/src/backend/postmaster/checkpointer.c
***************
*** 482,488 **** CheckpointerMain(void)
  				"checkpoints are occurring too frequently (%d seconds apart)",
  									   elapsed_secs,
  									   elapsed_secs),
! 						 errhint("Consider increasing the configuration parameter \"checkpoint_segments\".")));
  
  			/*
  			 * Initialize checkpointer-private variables used during
--- 482,488 ----
  				"checkpoints are occurring too frequently (%d seconds apart)",
  									   elapsed_secs,
  									   elapsed_secs),
! 						 errhint("Consider increasing the configuration parameter \"checkpoint_wal_size\".")));
  
  			/*
  			 * Initialize checkpointer-private variables used during
***************
*** 760,770 **** IsCheckpointOnSchedule(double progress)
  		return false;
  
  	/*
! 	 * Check progress against WAL segments written and checkpoint_segments.
  	 *
  	 * We compare the current WAL insert location against the location
  	 * computed before calling CreateCheckPoint. The code in XLogInsert that
! 	 * actually triggers a checkpoint when checkpoint_segments is exceeded
  	 * compares against RedoRecptr, so this is not completely accurate.
  	 * However, it's good enough for our purposes, we're only calculating an
  	 * estimate anyway.
--- 760,770 ----
  		return false;
  
  	/*
! 	 * Check progress against WAL segments written and CheckPointSegments.
  	 *
  	 * We compare the current WAL insert location against the location
  	 * computed before calling CreateCheckPoint. The code in XLogInsert that
! 	 * actually triggers a checkpoint when CheckPointSegments is exceeded
  	 * compares against RedoRecptr, so this is not completely accurate.
  	 * However, it's good enough for our purposes, we're only calculating an
  	 * estimate anyway.
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 1981,1996 **** static struct config_int ConfigureNamesInt[] =
  	},
  
  	{
! 		{"checkpoint_segments", PGC_SIGHUP, WAL_CHECKPOINTS,
! 			gettext_noop("Sets the maximum distance in log segments between automatic WAL checkpoints."),
! 			NULL
  		},
! 		&CheckPointSegments,
! 		3, 1, INT_MAX,
  		NULL, NULL, NULL
  	},
  
  	{
  		{"checkpoint_timeout", PGC_SIGHUP, WAL_CHECKPOINTS,
  			gettext_noop("Sets the maximum time between automatic WAL checkpoints."),
  			NULL,
--- 1981,2008 ----
  	},
  
  	{
! 		{"min_recycle_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
! 			gettext_noop("Sets the minimum size to shrink the WAL to."),
! 			NULL,
! 			GUC_UNIT_KB
  		},
! 		&min_recycle_wal_size,
! 		81920, 32768, INT_MAX,
  		NULL, NULL, NULL
  	},
  
  	{
+ 		{"checkpoint_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
+ 			gettext_noop("Sets the maximum WAL size that triggers a checkpoint."),
+ 			NULL,
+ 			GUC_UNIT_KB
+ 		},
+ 		&checkpoint_wal_size,
+ 		262144, 32768, INT_MAX,
+ 		NULL, assign_checkpoint_wal_size, NULL
+ 	},
+ 
+ 	{
  		{"checkpoint_timeout", PGC_SIGHUP, WAL_CHECKPOINTS,
  			gettext_noop("Sets the maximum time between automatic WAL checkpoints."),
  			NULL,
***************
*** 2573,2579 **** static struct config_real ConfigureNamesReal[] =
  		},
  		&CheckPointCompletionTarget,
  		0.5, 0.0, 1.0,
! 		NULL, NULL, NULL
  	},
  
  	/* End-of-list marker */
--- 2585,2591 ----
  		},
  		&CheckPointCompletionTarget,
  		0.5, 0.0, 1.0,
! 		NULL, assign_checkpoint_completion_target, NULL
  	},
  
  	/* End-of-list marker */
*** a/src/include/access/xlog.h
--- b/src/include/access/xlog.h
***************
*** 181,187 **** extern XLogRecPtr XactLastRecEnd;
  extern bool reachedConsistency;
  
  /* these variables are GUC parameters related to XLOG */
! extern int	CheckPointSegments;
  extern int	wal_keep_segments;
  extern int	XLOGbuffers;
  extern int	XLogArchiveTimeout;
--- 181,188 ----
  extern bool reachedConsistency;
  
  /* these variables are GUC parameters related to XLOG */
! extern int	min_recycle_wal_size;
! extern int	checkpoint_wal_size;
  extern int	wal_keep_segments;
  extern int	XLOGbuffers;
  extern int	XLogArchiveTimeout;
***************
*** 192,197 **** extern bool fullPageWrites;
--- 193,200 ----
  extern bool log_checkpoints;
  extern int	num_xloginsert_slots;
  
+ extern int	CheckPointSegments;
+ 
  /* WAL levels */
  typedef enum WalLevel
  {
***************
*** 319,324 **** extern bool CheckPromoteSignal(void);
--- 322,330 ----
  extern void WakeupRecovery(void);
  extern void SetWalWriterSleeping(bool sleeping);
  
+ extern void assign_checkpoint_wal_size(int newval, void *extra);
+ extern void assign_checkpoint_completion_target(double newval, void *extra);
+ 
  /*
   * Starting/stopping a base backup
   */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

Reply via email to