Fwd: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-07-20 Thread Fujii Masao
On Sat, Jul 17, 2010 at 4:22 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The patch adds the document about the relationship between a restartpoint
 and checkpoint_segments parameter.

 Thanks, committed with minor editorialization

Thanks.

  There will always be at least one WAL segment file, and will normally
  not be more than (2 + varnamecheckpoint_completion_target/varname) * 
 varnamecheckpoint_segments/varname + 1
 +or varnamecheckpoint_segments/ + xref 
 linkend=guc-wal-keep-segments + 1
  files.  Each segment file is normally 16 MB (though this size can be
  altered when building the server).  You can use this to estimate space
  requirements for acronymWAL/acronym.

Sorry, I was wrong here. The correct formula is:

(2 + checkpoint_completion_target) * checkpoint_segments +
wal_keep_segments + 1

The attached patch fixes this fault. And I attached the PDF file which
illustrates the proof of the formula.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


num_of_wal_formula_v1.patch
Description: Binary data


20100721_num_of_wal_in_pg_xlog.pdf
Description: Adobe PDF document

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-07-16 Thread Fujii Masao
On Thu, Jul 1, 2010 at 1:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for reminding me. I attached the updated patch.

This patch left uncommitted for half a month. No one is interested in
the patch?

The patch adds the document about the relationship between a restartpoint
and checkpoint_segments parameter.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-07-16 Thread Heikki Linnakangas

On 16/07/10 11:13, Fujii Masao wrote:

On Thu, Jul 1, 2010 at 1:09 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

Thanks for reminding me. I attached the updated patch.


This patch left uncommitted for half a month. No one is interested in
the patch?


Sorry for the lack of interest ;-)


The patch adds the document about the relationship between a restartpoint
and checkpoint_segments parameter.


Thanks, committed with minor editorialization

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-30 Thread Bruce Momjian

Did these changes ever get into the docs?  I don't think so.

---

Fujii Masao wrote:
 On Thu, Jun 10, 2010 at 7:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  --- 1902,1908 
  ? ? ? ? ?for standby purposes, and the number of old WAL segments
  available
  ? ? ? ? ?for standbys is determined based only on the location of the
  previous
  ? ? ? ? ?checkpoint and status of WAL archiving.
  + ? ? ? ? This parameter has no effect on a restartpoint.
  ? ? ? ? ?This parameter can only be set in the
  filenamepostgresql.conf/
  ? ? ? ? ?file or on the server command line.
  ? ? ? ? /para
 
  Hmm, I wonder if wal_keep_segments should take effect during recovery too?
  We don't support cascading slaves, but if you have two slaves connected to
  one master (without an archive), and you perform failover to one of them,
  without wal_keep_segments the 2nd slave might not find all the files it
  needs in the new master. Then again, that won't work without an archive
  anyway, because we error out at a TLI mismatch in replication. Seems like
  this is 9.1 material..
 
 Yep, since currently SR cannot get over the gap of TLI, wal_keep_segments
 is not worth taking effect during recovery.
 
  *** a/doc/src/sgml/wal.sgml
  --- b/doc/src/sgml/wal.sgml
  ***
  *** 424,429 
  --- 424,430 
  ? ?para
  ? ? There will always be at least one WAL segment file, and will normally
  ? ? not be more than (2 + varnamecheckpoint_completion_target/varname)
  * varnamecheckpoint_segments/varname + 1
  + ? ?or varnamecheckpoint_segments/ + xref
  linkend=guc-wal-keep-segments + 1
  ? ? files. ?Each segment file is normally 16 MB (though this size can be
  ? ? altered when building the server). ?You can use this to estimate space
  ? ? requirements for acronymWAL/acronym.
 
  That's not true, wal_keep_segments is the minimum number of files retained,
  independently of checkpoint_segments. The corret formula is (2 +
  checkpoint_completion_target * checkpoint_segments, wal_keep_segments)
 
 You mean that the maximum number of WAL files is: ?
 
 max {
   (2 + checkpoint_completion_target) * checkpoint_segments,
   wal_keep_segments
 }
 
 Just after a checkpoint removes old WAL files, there might be 
 wal_keep_segments
 WAL files. Additionally, checkpoint_segments WAL files might be generated 
 before
 the subsequent checkpoint removes old WAL files. So I think that the maximum
 number is
 
 max {
   (2 + checkpoint_completion_target) * checkpoint_segments,
   wal_keep_segments + checkpoint_segments
 }
 
 Am I missing something?
 
  ? ?para
  + ? ?In archive recovery or standby mode, the server periodically performs
  + ? ?firsttermrestartpoints/indextermprimaryrestartpoint//
  + ? ?which are similar to checkpoints in normal operation: the server
  forces
  + ? ?all its state to disk, updates the filenamepg_control/ file to
  + ? ?indicate that the already-processed WAL data need not be scanned
  again,
  + ? ?and then recycles old log segment files if they are in the
  + ? ?filenamepg_xlog/ directory. Note that this recycling is not
  affected
  + ? ?by varnamewal_keep_segments/ at all. A restartpoint is triggered,
  + ? ?if at least one checkpoint record has been replayed since the last
  + ? ?restartpoint, every varnamecheckpoint_timeout/ seconds, or every
  + ? ?varnamecheckoint_segments/ log segments only in standby mode,
  + ? ?whichever comes first
 
  That last sentence is a bit unclear. How about:
 
  A restartpoint is triggered if at least one checkpoint record has been
  replayed and varnamecheckpoint_timeout/ seconds have passed since last
  restartpoint. In standby mode, a restartpoint is also triggered if
  varnamecheckoint_segments/ log segments have been replayed since last
  restartpoint and at least one checkpoint record has been replayed since.
 
 Thanks! Seems good.
 
  ... In log shipping case, the checkpoint interval
  + ? ?on the standby is normally smaller than that on the master.
  + ? /para
 
  What does that mean? Restartpoints can't be performed more frequently than
  checkpoints in the master because restartpoints can only be performed at
  checkpoint records.
 
 Yes, that's what I meant.
 
 Regards,
 
 -- 
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-30 Thread Fujii Masao
On Thu, Jul 1, 2010 at 11:39 AM, Bruce Momjian br...@momjian.us wrote:

 Did these changes ever get into the docs?  I don't think so.

Thanks for reminding me. I attached the updated patch.

  That last sentence is a bit unclear. How about:
 
  A restartpoint is triggered if at least one checkpoint record has been
  replayed and varnamecheckpoint_timeout/ seconds have passed since last
  restartpoint. In standby mode, a restartpoint is also triggered if
  varnamecheckoint_segments/ log segments have been replayed since last
  restartpoint and at least one checkpoint record has been replayed since.

  ... In log shipping case, the checkpoint interval
  + ? ?on the standby is normally smaller than that on the master.
  + ? /para
 
  What does that mean? Restartpoints can't be performed more frequently than
  checkpoints in the master because restartpoints can only be performed at
  checkpoint records.

I adopted these Heikki's sentences.

  *** a/doc/src/sgml/wal.sgml
  --- b/doc/src/sgml/wal.sgml
  ***
  *** 424,429 
  --- 424,430 
  ? ?para
  ? ? There will always be at least one WAL segment file, and will normally
  ? ? not be more than (2 + varnamecheckpoint_completion_target/varname)
  * varnamecheckpoint_segments/varname + 1
  + ? ?or varnamecheckpoint_segments/ + xref
  linkend=guc-wal-keep-segments + 1
  ? ? files. ?Each segment file is normally 16 MB (though this size can be
  ? ? altered when building the server). ?You can use this to estimate space
  ? ? requirements for acronymWAL/acronym.
 
  That's not true, wal_keep_segments is the minimum number of files retained,
  independently of checkpoint_segments. The corret formula is (2 +
  checkpoint_completion_target * checkpoint_segments, wal_keep_segments)

 You mean that the maximum number of WAL files is: ?

 max {
   (2 + checkpoint_completion_target) * checkpoint_segments,
   wal_keep_segments
 }

 Just after a checkpoint removes old WAL files, there might be 
 wal_keep_segments
 WAL files. Additionally, checkpoint_segments WAL files might be generated 
 before
 the subsequent checkpoint removes old WAL files. So I think that the maximum
 number is

 max {
   (2 + checkpoint_completion_target) * checkpoint_segments,
   wal_keep_segments + checkpoint_segments
 }

 Am I missing something?

I've left this part as it is. Before committing the patch, we need to check
whether my thought is true.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


trigger_restartpoint_doc_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 12:09 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed with some cosmetic changes.

Thanks!

 BTW, should there be doc changes for this? I didn't find anything explaining
 how restartpoints are triggered, we should add a paragraph somewhere.

+1

What about the attached patch?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


trigger_restartpoint_doc_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 09:14, Fujii Masao wrote:

On Thu, Jun 10, 2010 at 12:09 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

BTW, should there be doc changes for this? I didn't find anything explaining
how restartpoints are triggered, we should add a paragraph somewhere.


+1

What about the attached patch?


 (description of wal_keep_segments)

*** 1902,1907  SET ENABLE_SEQSCAN TO OFF;
--- 1902,1908 
  for standby purposes, and the number of old WAL segments available
  for standbys is determined based only on the location of the previous
  checkpoint and status of WAL archiving.
+ This parameter has no effect on a restartpoint.
  This parameter can only be set in the filenamepostgresql.conf/
  file or on the server command line.
 /para


Hmm, I wonder if wal_keep_segments should take effect during recovery 
too? We don't support cascading slaves, but if you have two slaves 
connected to one master (without an archive), and you perform failover 
to one of them, without wal_keep_segments the 2nd slave might not find 
all the files it needs in the new master. Then again, that won't work 
without an archive anyway, because we error out at a TLI mismatch in 
replication. Seems like this is 9.1 material..



*** a/doc/src/sgml/wal.sgml
--- b/doc/src/sgml/wal.sgml
***
*** 424,429 
--- 424,430 
para
 There will always be at least one WAL segment file, and will normally
 not be more than (2 + varnamecheckpoint_completion_target/varname) * 
varnamecheckpoint_segments/varname + 1
+or varnamecheckpoint_segments/ + xref 
linkend=guc-wal-keep-segments + 1
 files.  Each segment file is normally 16 MB (though this size can be
 altered when building the server).  You can use this to estimate space
 requirements for acronymWAL/acronym.


That's not true, wal_keep_segments is the minimum number of files 
retained, independently of checkpoint_segments. The corret formula is (2 
+ checkpoint_completion_target * checkpoint_segments, wal_keep_segments)



para
+In archive recovery or standby mode, the server periodically performs
+firsttermrestartpoints/indextermprimaryrestartpoint//
+which are similar to checkpoints in normal operation: the server forces
+all its state to disk, updates the filenamepg_control/ file to
+indicate that the already-processed WAL data need not be scanned again,
+and then recycles old log segment files if they are in the
+filenamepg_xlog/ directory. Note that this recycling is not affected
+by varnamewal_keep_segments/ at all. A restartpoint is triggered,
+if at least one checkpoint record has been replayed since the last
+restartpoint, every varnamecheckpoint_timeout/ seconds, or every
+varnamecheckoint_segments/ log segments only in standby mode,
+whichever comes first


That last sentence is a bit unclear. How about:

A restartpoint is triggered if at least one checkpoint record has been 
replayed and varnamecheckpoint_timeout/ seconds have passed since 
last restartpoint. In standby mode, a restartpoint is also triggered if 
varnamecheckoint_segments/ log segments have been replayed since 
last restartpoint and at least one checkpoint record has been replayed 
since.



... In log shipping case, the checkpoint interval
+on the standby is normally smaller than that on the master.
+   /para


What does that mean? Restartpoints can't be performed more frequently 
than checkpoints in the master because restartpoints can only be 
performed at checkpoint records.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-10 Thread Fujii Masao
On Thu, Jun 10, 2010 at 7:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 --- 1902,1908 
          for standby purposes, and the number of old WAL segments
 available
          for standbys is determined based only on the location of the
 previous
          checkpoint and status of WAL archiving.
 +         This parameter has no effect on a restartpoint.
          This parameter can only be set in the
 filenamepostgresql.conf/
          file or on the server command line.
         /para

 Hmm, I wonder if wal_keep_segments should take effect during recovery too?
 We don't support cascading slaves, but if you have two slaves connected to
 one master (without an archive), and you perform failover to one of them,
 without wal_keep_segments the 2nd slave might not find all the files it
 needs in the new master. Then again, that won't work without an archive
 anyway, because we error out at a TLI mismatch in replication. Seems like
 this is 9.1 material..

Yep, since currently SR cannot get over the gap of TLI, wal_keep_segments
is not worth taking effect during recovery.

 *** a/doc/src/sgml/wal.sgml
 --- b/doc/src/sgml/wal.sgml
 ***
 *** 424,429 
 --- 424,430 
    para
     There will always be at least one WAL segment file, and will normally
     not be more than (2 + varnamecheckpoint_completion_target/varname)
 * varnamecheckpoint_segments/varname + 1
 +    or varnamecheckpoint_segments/ + xref
 linkend=guc-wal-keep-segments + 1
     files.  Each segment file is normally 16 MB (though this size can be
     altered when building the server).  You can use this to estimate space
     requirements for acronymWAL/acronym.

 That's not true, wal_keep_segments is the minimum number of files retained,
 independently of checkpoint_segments. The corret formula is (2 +
 checkpoint_completion_target * checkpoint_segments, wal_keep_segments)

You mean that the maximum number of WAL files is: ?

max {
  (2 + checkpoint_completion_target) * checkpoint_segments,
  wal_keep_segments
}

Just after a checkpoint removes old WAL files, there might be wal_keep_segments
WAL files. Additionally, checkpoint_segments WAL files might be generated before
the subsequent checkpoint removes old WAL files. So I think that the maximum
number is

max {
  (2 + checkpoint_completion_target) * checkpoint_segments,
  wal_keep_segments + checkpoint_segments
}

Am I missing something?

    para
 +    In archive recovery or standby mode, the server periodically performs
 +    firsttermrestartpoints/indextermprimaryrestartpoint//
 +    which are similar to checkpoints in normal operation: the server
 forces
 +    all its state to disk, updates the filenamepg_control/ file to
 +    indicate that the already-processed WAL data need not be scanned
 again,
 +    and then recycles old log segment files if they are in the
 +    filenamepg_xlog/ directory. Note that this recycling is not
 affected
 +    by varnamewal_keep_segments/ at all. A restartpoint is triggered,
 +    if at least one checkpoint record has been replayed since the last
 +    restartpoint, every varnamecheckpoint_timeout/ seconds, or every
 +    varnamecheckoint_segments/ log segments only in standby mode,
 +    whichever comes first

 That last sentence is a bit unclear. How about:

 A restartpoint is triggered if at least one checkpoint record has been
 replayed and varnamecheckpoint_timeout/ seconds have passed since last
 restartpoint. In standby mode, a restartpoint is also triggered if
 varnamecheckoint_segments/ log segments have been replayed since last
 restartpoint and at least one checkpoint record has been replayed since.

Thanks! Seems good.

 ... In log shipping case, the checkpoint interval
 +    on the standby is normally smaller than that on the master.
 +   /para

 What does that mean? Restartpoints can't be performed more frequently than
 checkpoints in the master because restartpoints can only be performed at
 checkpoint records.

Yes, that's what I meant.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-09 Thread Heikki Linnakangas

On 09/06/10 05:26, Fujii Masao wrote:

On Wed, Jun 2, 2010 at 10:24 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

On Wed, Jun 2, 2010 at 8:40 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 02/06/10 06:23, Fujii Masao wrote:


On Mon, May 31, 2010 at 7:17 PM, Fujii Masaomasao.fu...@gmail.com
  wrote:


4) Change it so that checkpoint_segments takes effect in standby mode,
but not during recovery otherwise


I revised the patch to achieve 4). This will enable checkpoint_segments
to trigger a restartpoint like checkpoint_timeout already does, in
standby mode (i.e., streaming replication or file-based log shipping).


Hmm, XLogCtl-Insert.RedoRecPtr is not updated during recovery, so this
doesn't work.


Oops! I revised the patch, which changes CreateRestartPoint() so that
it updates XLogCtl-Insert.RedoRecPtr.


This is one of open items. Please review the patch I submitted, and
please feel free to comment!


Ok, committed with some cosmetic changes.

I thought hard if we should do this at all, since the original decision 
to do time-based restartpoints was deliberate. I concluded that the 
tradeoffs have changed enough since then to make this reasonable. We now 
perform restartpoints is bgwriter, so the replay will continue while the 
restartpoint is being performed, making it less disruptive than it used 
to be, and secondly SR stores the streamed WAL files in pg_xlog, making 
it important to perform restartpoints often enough to clean them up and 
avoid out-of-disk space.


BTW, should there be doc changes for this? I didn't find anything 
explaining how restartpoints are triggered, we should add a paragraph 
somewhere.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-08 Thread Fujii Masao
On Wed, Jun 2, 2010 at 10:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Jun 2, 2010 at 8:40 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 02/06/10 06:23, Fujii Masao wrote:

 On Mon, May 31, 2010 at 7:17 PM, Fujii Masaomasao.fu...@gmail.com
  wrote:

 4) Change it so that checkpoint_segments takes effect in standby mode,
 but not during recovery otherwise

 I revised the patch to achieve 4). This will enable checkpoint_segments
 to trigger a restartpoint like checkpoint_timeout already does, in
 standby mode (i.e., streaming replication or file-based log shipping).

 Hmm, XLogCtl-Insert.RedoRecPtr is not updated during recovery, so this
 doesn't work.

 Oops! I revised the patch, which changes CreateRestartPoint() so that
 it updates XLogCtl-Insert.RedoRecPtr.

This is one of open items. Please review the patch I submitted, and
please feel free to comment!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-02 Thread Heikki Linnakangas

On 02/06/10 06:23, Fujii Masao wrote:

On Mon, May 31, 2010 at 7:17 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

4) Change it so that checkpoint_segments takes effect in standby mode,
but not during recovery otherwise


I revised the patch to achieve 4). This will enable checkpoint_segments
to trigger a restartpoint like checkpoint_timeout already does, in
standby mode (i.e., streaming replication or file-based log shipping).


Hmm, XLogCtl-Insert.RedoRecPtr is not updated during recovery, so this 
doesn't work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-02 Thread Fujii Masao
On Wed, Jun 2, 2010 at 8:40 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 02/06/10 06:23, Fujii Masao wrote:

 On Mon, May 31, 2010 at 7:17 PM, Fujii Masaomasao.fu...@gmail.com
  wrote:

 4) Change it so that checkpoint_segments takes effect in standby mode,
 but not during recovery otherwise

 I revised the patch to achieve 4). This will enable checkpoint_segments
 to trigger a restartpoint like checkpoint_timeout already does, in
 standby mode (i.e., streaming replication or file-based log shipping).

 Hmm, XLogCtl-Insert.RedoRecPtr is not updated during recovery, so this
 doesn't work.

Oops! I revised the patch, which changes CreateRestartPoint() so that
it updates XLogCtl-Insert.RedoRecPtr.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


checkpoint_segments_during_recovery_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-01 Thread Heikki Linnakangas

On 31/05/10 18:14, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

The central question is whether checkpoint_segments should trigger
restartpoints or not. When PITR and restartpoints were introduced, the
answer was no, on the grounds that when you're doing recovery you're
presumably replaying the logs much faster than they were generated, and
you don't want to slow down the recovery by checkpointing too often.



Now that we have bgwriter active during recovery, and streaming
replication which retains the streamed WALs so that we now risk running
out of disk space with long checkpoint_timeout, it's time to reconsider
that.



I think we have three options:


What about

(4) pay some attention to the actual elapsed time since the last
restart point?

All the others seem like kluges that are relying on hard-wired rules
that are hoped to achieve something like a time-based checkpoint.


Huh? We already do time-based restartpoints, there's nothing wrong with 
that logic AFAIK. The problem that started this thread is that we don't 
do WAL-space consumption based restartpoints, i.e. checkpoint_segments 
does nothing in standby mode.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-06-01 Thread Fujii Masao
On Mon, May 31, 2010 at 7:17 PM, Fujii Masao masao.fu...@gmail.com wrote:
 4) Change it so that checkpoint_segments takes effect in standby mode,
 but not during recovery otherwise

I revised the patch to achieve 4). This will enable checkpoint_segments
to trigger a restartpoint like checkpoint_timeout already does, in
standby mode (i.e., streaming replication or file-based log shipping).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


checkpoint_segments_during_recovery_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-31 Thread Heikki Linnakangas

On 30/05/10 06:04, Fujii Masao wrote:

On Fri, May 28, 2010 at 11:12 AM, Fujii Masaomasao.fu...@gmail.com  wrote:

On Thu, May 27, 2010 at 11:13 PM, Robert Haasrobertmh...@gmail.com  wrote:

I guess this happens because the frequency of checkpoint on the standby is
too lower than that on the master. In the master, checkpoint occurs for every
consumption of three segments because of checkpoint_segments = 3. On the
other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
occurs for every 30 minutes because of checkpoint_timeout = 30min.

The walreceiver should signal the bgwriter to start checkpoint if it has
received more than checkpoint_segments WAL files, like normal processing?


Is this also an issue when using log shipping, or just with SR?


When using log shipping, checkpoint_segments always doesn't trigger a
checkpoint. So recovery after the standby crashes might take unexpectedly
long since redo starting point might be old.

But in file-based log shipping, since WAL files don't accumulate in
pg_xlog directory on the standby, even if the frequency of checkpoint
is very low, pg_xlog will not be filled with many WAL files. That
accumulation occurs only when using SR.

If we should avoid low frequency of checkpoint itself rather than
accumulation of WAL files, the bgwriter instead of the walreceiver
should check if we've consumed too much WAL, I think. Thought?


I attached the patch, which changes the startup process so that it signals
bgwriter to perform a restartpoint if we've already replayed too much WAL
files. This leads checkpoint_segments to trigger a restartpoint.


The central question is whether checkpoint_segments should trigger 
restartpoints or not. When PITR and restartpoints were introduced, the 
answer was no, on the grounds that when you're doing recovery you're 
presumably replaying the logs much faster than they were generated, and 
you don't want to slow down the recovery by checkpointing too often.


Now that we have bgwriter active during recovery, and streaming 
replication which retains the streamed WALs so that we now risk running 
out of disk space with long checkpoint_timeout, it's time to reconsider 
that.


I think we have three options:

1) Leave it as it is, checkpoint_segments doesn't do anything during 
recovery/standby mode


2) Change it so that checkpoint_segments does take effect during 
recover/standby


3) Change it so that checkpoint_segments takes effect during streaming 
replication, but not during recovery otherwise


I'm leaning towards 3), it still seems reasonable to not slow down 
recovery when recovering from archive, but the potential for out of disk 
space warrants doing 3.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-31 Thread Fujii Masao
On Mon, May 31, 2010 at 6:37 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The central question is whether checkpoint_segments should trigger
 restartpoints or not. When PITR and restartpoints were introduced, the
 answer was no, on the grounds that when you're doing recovery you're
 presumably replaying the logs much faster than they were generated, and you
 don't want to slow down the recovery by checkpointing too often.

Right.

 Now that we have bgwriter active during recovery, and streaming replication
 which retains the streamed WALs so that we now risk running out of disk
 space with long checkpoint_timeout, it's time to reconsider that.

 I think we have three options:

 1) Leave it as it is, checkpoint_segments doesn't do anything during
 recovery/standby mode

 2) Change it so that checkpoint_segments does take effect during
 recover/standby

 3) Change it so that checkpoint_segments takes effect during streaming
 replication, but not during recovery otherwise

 I'm leaning towards 3), it still seems reasonable to not slow down recovery
 when recovering from archive, but the potential for out of disk space
 warrants doing 3.

3) makes sense. But how about 4)?

4) Change it so that checkpoint_segments takes effect in standby mode,
but not during recovery otherwise

This would lessen the time required to restart the standby also in
file-based log shipping case. Of course, there is the tradeoff
between the speed of recovery and the recovery time.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-31 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 The central question is whether checkpoint_segments should trigger 
 restartpoints or not. When PITR and restartpoints were introduced, the 
 answer was no, on the grounds that when you're doing recovery you're 
 presumably replaying the logs much faster than they were generated, and 
 you don't want to slow down the recovery by checkpointing too often.

 Now that we have bgwriter active during recovery, and streaming 
 replication which retains the streamed WALs so that we now risk running 
 out of disk space with long checkpoint_timeout, it's time to reconsider 
 that.

 I think we have three options:

What about

(4) pay some attention to the actual elapsed time since the last
restart point?

All the others seem like kluges that are relying on hard-wired rules
that are hoped to achieve something like a time-based checkpoint.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-29 Thread Fujii Masao
On Fri, May 28, 2010 at 11:12 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, May 27, 2010 at 11:13 PM, Robert Haas robertmh...@gmail.com wrote:
 I guess this happens because the frequency of checkpoint on the standby is
 too lower than that on the master. In the master, checkpoint occurs for 
 every
 consumption of three segments because of checkpoint_segments = 3. On the
 other hand, in the standby, only checkpoint_timeout has effect, so 
 checkpoint
 occurs for every 30 minutes because of checkpoint_timeout = 30min.

 The walreceiver should signal the bgwriter to start checkpoint if it has
 received more than checkpoint_segments WAL files, like normal processing?

 Is this also an issue when using log shipping, or just with SR?

 When using log shipping, checkpoint_segments always doesn't trigger a
 checkpoint. So recovery after the standby crashes might take unexpectedly
 long since redo starting point might be old.

 But in file-based log shipping, since WAL files don't accumulate in
 pg_xlog directory on the standby, even if the frequency of checkpoint
 is very low, pg_xlog will not be filled with many WAL files. That
 accumulation occurs only when using SR.

 If we should avoid low frequency of checkpoint itself rather than
 accumulation of WAL files, the bgwriter instead of the walreceiver
 should check if we've consumed too much WAL, I think. Thought?

I attached the patch, which changes the startup process so that it signals
bgwriter to perform a restartpoint if we've already replayed too much WAL
files. This leads checkpoint_segments to trigger a restartpoint.

This patch is worth applying for 9.0? If not, I'll add it into the next CF.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 508,513  static bool reachedMinRecoveryPoint = false;
--- 508,516 
  
  static bool InRedo = false;
  
+ /* We've already launched bgwriter to perform restartpoint? */
+ static bool bgwriterLaunched = false;
+ 
  /*
   * Information logged when we detect a change in one of the parameters
   * important for Hot Standby.
***
*** 550,555  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
--- 553,559 
  static bool XLogCheckBuffer(XLogRecData *rdata, bool doPageWrites,
  XLogRecPtr *lsn, BkpBlock *bkpb);
  static bool AdvanceXLInsertBuffer(bool new_segment);
+ static bool XLogCheckpointNeeded(uint32 logid, uint32 logseg);
  static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch);
  static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
  	   bool find_free, int *max_advance,
***
*** 1554,1567  AdvanceXLInsertBuffer(bool new_segment)
  /*
   * Check whether we've consumed enough xlog space that a checkpoint is needed.
   *
!  * Caller must have just finished filling the open log file (so that
!  * openLogId/openLogSeg are valid).  We measure the distance from RedoRecPtr
!  * to the open log file and see if that exceeds CheckPointSegments.
   *
   * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
   */
  static bool
! XLogCheckpointNeeded(void)
  {
  	/*
  	 * A straight computation of segment number could overflow 32 bits. Rather
--- 1558,1571 
  /*
   * Check whether we've consumed enough xlog space that a checkpoint is needed.
   *
!  * Caller must have just finished filling or reading the log file (so that
!  * the given logid/logseg are valid).  We measure the distance from RedoRecPtr
!  * to the log file and see if that exceeds CheckPointSegments.
   *
   * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
   */
  static bool
! XLogCheckpointNeeded(uint32 logid, uint32 logseg)
  {
  	/*
  	 * A straight computation of segment number could overflow 32 bits. Rather
***
*** 1577,1584  XLogCheckpointNeeded(void)
  	old_segno = (RedoRecPtr.xlogid % XLogSegSize) * XLogSegsPerFile +
  		(RedoRecPtr.xrecoff / XLogSegSize);
  	old_highbits = RedoRecPtr.xlogid / XLogSegSize;
! 	new_segno = (openLogId % XLogSegSize) * XLogSegsPerFile + openLogSeg;
! 	new_highbits = openLogId / XLogSegSize;
  	if (new_highbits != old_highbits ||
  		new_segno = old_segno + (uint32) (CheckPointSegments - 1))
  		return true;
--- 1581,1588 
  	old_segno = (RedoRecPtr.xlogid % XLogSegSize) * XLogSegsPerFile +
  		(RedoRecPtr.xrecoff / XLogSegSize);
  	old_highbits = RedoRecPtr.xlogid / XLogSegSize;
! 	new_segno = (logid % XLogSegSize) * XLogSegsPerFile + logseg;
! 	new_highbits = logid / XLogSegSize;
  	if (new_highbits != old_highbits ||
  		new_segno = old_segno + (uint32) (CheckPointSegments - 1))
  		return true;
***
*** 1782,1791  XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
   * update RedoRecPtr and recheck.
   */
  if (IsUnderPostmaster 
! 	

[HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Sander, Ingo (NSN - DE/Munich)

With the parameter checkpoint_segment and wal_keep_segments the max. number of 
wal segments are set. If now the max number is reached, 
(1) the segments are deleted/recycled 
or (2) if the time set by the checkpoint_timeout is over, a checkpoint is set 
and if possible a deletion/recycling is done. 
This is the mechanism on the active side of a db server. On the standby side 
however only unused tranferred segments will be deleted if the 
checkpoint_timeout mechanism (2) is executed.
Is this a correct behaviour or it is an error? 
 
I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10 and 
checkpoint_timeout set to 30min) that in my stress test the disk usage on 
standby side is increased up to 2GB with xlog segments whereby on the active 
side only ~60MB xlog files are available (we have patched the xlog file size to 
4MB). To prevent this one possibility is to decreace the checkpoint_timeout to 
a low value (30sec), however this had the disadvantage that a checkpoint is 
often executed on active side which can influence the performance. Another 
possibility is to have different postgresql.conf on active and on standby side, 
but this is not our preferred solution. 



Best Regards/mfG
Ingo Sander
=
Nokia Siemens Networks GmbH Co. KG
NWS EP CP SVSS Platform Tech Support DE
St.-Martin-Str. 76
D-81541 München
*Tel.:  +49-89-515938390
*ingo.san...@nsn.com




Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
ingo.san...@nsn.com wrote:

 With the parameter checkpoint_segment and wal_keep_segments the max. number
 of wal segments are set. If now the max number is reached,

 (1) the segments are deleted/recycled
 or (2) if the time set by the checkpoint_timeout is over, a checkpoint is
 set and if possible a deletion/recycling is done.

 This is the mechanism on the active side of a db server. On the standby side
 however only unused tranferred segments will be deleted if the
 checkpoint_timeout mechanism (2) is executed.

 Is this a correct behaviour or it is an error?

 I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10
 and checkpoint_timeout set to 30min) that in my stress test the disk usage
 on standby side is increased up to 2GB with xlog segments whereby on the
 active side only ~60MB xlog files are available (we have patched the xlog
 file size to 4MB). To prevent this one possibility is to decreace the
 checkpoint_timeout to a low value (30sec), however this had the disadvantage
 that a checkpoint is often executed on active side which can influence the
 performance. Another possibility is to have different postgresql.conf on
 active and on standby side, but this is not our preferred solution.

I guess this happens because the frequency of checkpoint on the standby is
too lower than that on the master. In the master, checkpoint occurs for every
consumption of three segments because of checkpoint_segments = 3. On the
other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
occurs for every 30 minutes because of checkpoint_timeout = 30min.

The walreceiver should signal the bgwriter to start checkpoint if it has
received more than checkpoint_segments WAL files, like normal processing?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 10:09 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
 ingo.san...@nsn.com wrote:

 With the parameter checkpoint_segment and wal_keep_segments the max. number
 of wal segments are set. If now the max number is reached,

 (1) the segments are deleted/recycled
 or (2) if the time set by the checkpoint_timeout is over, a checkpoint is
 set and if possible a deletion/recycling is done.

 This is the mechanism on the active side of a db server. On the standby side
 however only unused tranferred segments will be deleted if the
 checkpoint_timeout mechanism (2) is executed.

 Is this a correct behaviour or it is an error?

 I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10
 and checkpoint_timeout set to 30min) that in my stress test the disk usage
 on standby side is increased up to 2GB with xlog segments whereby on the
 active side only ~60MB xlog files are available (we have patched the xlog
 file size to 4MB). To prevent this one possibility is to decreace the
 checkpoint_timeout to a low value (30sec), however this had the disadvantage
 that a checkpoint is often executed on active side which can influence the
 performance. Another possibility is to have different postgresql.conf on
 active and on standby side, but this is not our preferred solution.

 I guess this happens because the frequency of checkpoint on the standby is
 too lower than that on the master. In the master, checkpoint occurs for every
 consumption of three segments because of checkpoint_segments = 3. On the
 other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
 occurs for every 30 minutes because of checkpoint_timeout = 30min.

 The walreceiver should signal the bgwriter to start checkpoint if it has
 received more than checkpoint_segments WAL files, like normal processing?

Is this also an issue when using log shipping, or just with SR?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 11:13 PM, Robert Haas robertmh...@gmail.com wrote:
 I guess this happens because the frequency of checkpoint on the standby is
 too lower than that on the master. In the master, checkpoint occurs for every
 consumption of three segments because of checkpoint_segments = 3. On the
 other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
 occurs for every 30 minutes because of checkpoint_timeout = 30min.

 The walreceiver should signal the bgwriter to start checkpoint if it has
 received more than checkpoint_segments WAL files, like normal processing?

 Is this also an issue when using log shipping, or just with SR?

When using log shipping, checkpoint_segments always doesn't trigger a
checkpoint. So recovery after the standby crashes might take unexpectedly
long since redo starting point might be old.

But in file-based log shipping, since WAL files don't accumulate in
pg_xlog directory on the standby, even if the frequency of checkpoint
is very low, pg_xlog will not be filled with many WAL files. That
accumulation occurs only when using SR.

If we should avoid low frequency of checkpoint itself rather than
accumulation of WAL files, the bgwriter instead of the walreceiver
should check if we've consumed too much WAL, I think. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Sander, Ingo (NSN - DE/Munich)
Both nodes (active and standby) have the same configuration parameters. 
The observed effect happens too if the checkpoint timeout is decreaased.

The problem seems to be that on standby no checkpoints are written and 
only the chekpoint_timeout mechanism is active

Regards
Ingo

-Original Message-
From: ext Fujii Masao [mailto:masao.fu...@gmail.com] 
Sent: Thursday, May 27, 2010 4:10 PM
To: Sander, Ingo (NSN - DE/Munich)
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Streaming Replication: Checkpoint_segment and
wal_keep_segments on standby

On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
ingo.san...@nsn.com wrote:

 With the parameter checkpoint_segment and wal_keep_segments the max.
number
 of wal segments are set. If now the max number is reached,

 (1) the segments are deleted/recycled
 or (2) if the time set by the checkpoint_timeout is over, a checkpoint
is
 set and if possible a deletion/recycling is done.

 This is the mechanism on the active side of a db server. On the
standby side
 however only unused tranferred segments will be deleted if the
 checkpoint_timeout mechanism (2) is executed.

 Is this a correct behaviour or it is an error?

 I have observed (checkpoint_segment set to 3; wal_keep_segments set to
10
 and checkpoint_timeout set to 30min) that in my stress test the disk
usage
 on standby side is increased up to 2GB with xlog segments whereby on
the
 active side only ~60MB xlog files are available (we have patched the
xlog
 file size to 4MB). To prevent this one possibility is to decreace the
 checkpoint_timeout to a low value (30sec), however this had the
disadvantage
 that a checkpoint is often executed on active side which can influence
the
 performance. Another possibility is to have different postgresql.conf
on
 active and on standby side, but this is not our preferred solution.

I guess this happens because the frequency of checkpoint on the standby
is
too lower than that on the master. In the master, checkpoint occurs for
every
consumption of three segments because of checkpoint_segments = 3. On
the
other hand, in the standby, only checkpoint_timeout has effect, so
checkpoint
occurs for every 30 minutes because of checkpoint_timeout = 30min.

The walreceiver should signal the bgwriter to start checkpoint if it has
received more than checkpoint_segments WAL files, like normal
processing?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers