Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Albe Laurenz
Jay Levitt wrote:
 We need to do a few bulk updates as Rails migrations.  We're a typical
 read-mostly web site, so at the moment, our checkpoint settings and
WAL are
 all default (3 segments, 5 min, 16MB), and updating a million rows
takes 10
 minutes due to all the checkpointing.
 
 We have no replication or hot standbys.  As a consumer-web startup,
with no
 SLA, and not a huge database, and if we ever do have to recover from
 downtime it's ok if it takes longer.. is there a reason NOT to always
run
 with something like checkpoint_segments = 1000, as long as I leave the
 timeout at 5m?

There's nothing wrong with the idea except for the amount of WAL and a
huge checkpoint that can stall your system for a while in a worst-case
scenario.  You can't get rid of checkpoint I/O completely.

I'd tune to a more conservative value, maybe 30 or at most 100 and see
if that solves your problem.  Check statistics to see if checkpoints
are time-driven or not.  As soon as almost all checkpoints are time-
driven, further raising of checkpoint_segments won't do anything for
you.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji venkat.bal...@verse.in
 wrote:
 
  On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt jay.lev...@gmail.com
 wrote:
 
  We need to do a few bulk updates as Rails migrations.  We're a typical
  read-mostly web site, so at the moment, our checkpoint settings and WAL
 are
  all default (3 segments, 5 min, 16MB), and updating a million rows
 takes 10
  minutes due to all the checkpointing.
 
  We have no replication or hot standbys.  As a consumer-web startup, with
  no SLA, and not a huge database, and if we ever do have to recover from
  downtime it's ok if it takes longer.. is there a reason NOT to always
 run
  with something like checkpoint_segments = 1000, as long as I leave the
  timeout at 5m?
 
 
  Still checkpoints keep occurring every 5 mins. Anyways
  checkpoint_segments=1000 is huge, this implies you are talking about
  16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
  perspective and data loss perspective. Even in the most unimaginable
 case if
  all of these 1000 files get filled up in less than 5 mins, there are
 chances
  that system will slow down due to high IO and CPU.



 As far as I know there is no data loss issue with a lot of checkpoint
 segments.


Data loss would be an issue when there is a server crash or pg_xlog crash
etc. That many number of pg_xlog files (1000) would contribute to huge data
loss (data changes not synced to the base are not guaranteed). Of-course,
this is not related to the current situation.  Normally we calculate the
checkpoint completion time, IO pressure, CPU load and the threat to the
data loss when we configure checkpoint_segments.


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Tomas Vondra
On 15 Únor 2012, 10:38, Venkat Balaji wrote:

 Data loss would be an issue when there is a server crash or pg_xlog crash
 etc. That many number of pg_xlog files (1000) would contribute to huge
 data
 loss (data changes not synced to the base are not guaranteed). Of-course,
 this is not related to the current situation.  Normally we calculate the
 checkpoint completion time, IO pressure, CPU load and the threat to the
 data loss when we configure checkpoint_segments.

So you're saying that by using small number of checkpoint segments you
limit the data loss when the WAL gets corrupted/lost? That's a bit like
buying a Maseratti and then not going faster than 10mph because you might
crash at higher speeds ...

The problem here is that the WAL is usually placed on more reliable drives
(compared to the data files) or a RAID1 array and as it's just writing
data sequentially, so the usage pattern is much less likely to cause
data/drive corruption (compared to data files that need to handle a lot of
random I/O, etc.).

So while it possible the WAL might get corrupted, the probability of data
file corruption is much higher. And the corruption might easily happen
silently during a checkpoint, so there won't be any WAL segments no matter
how many of them you keep ...

And by using low number of checkpoint segments it actually gets worse,
because it means more frequent checkpoints - more I/O on the drives -
more wearout of the drives etc.

If you need to protect yourself against this, you need to keep a WAL
archive (prefferably on a separate machine) and/or a hot standby for
failover.

kind regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Andres Freund
On Wednesday, February 15, 2012 10:38:23 AM Venkat Balaji wrote:
 On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe 
scott.marl...@gmail.comwrote:
  On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji venkat.bal...@verse.in
   all of these 1000 files get filled up in less than 5 mins, there are
   chances that system will slow down due to high IO and CPU.
  As far as I know there is no data loss issue with a lot of checkpoint
  segments.
 Data loss would be an issue when there is a server crash or pg_xlog crash
 etc. That many number of pg_xlog files (1000) would contribute to huge data
 loss (data changes not synced to the base are not guaranteed). Of-course,
 this is not related to the current situation.  Normally we calculate the
 checkpoint completion time, IO pressure, CPU load and the threat to the
 data loss when we configure checkpoint_segments.
I think you might be misunderstanding something. A high number of 
checkpoint_segments can lead to slower recovery - all those changes need to be 
reapplied - but it won't lead to lost data. The data inside the wal will be 
fsynced at appropriate times (commit; background writer; too much written).


Andres

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji
  Data loss would be an issue when there is a server crash or pg_xlog crash
  etc. That many number of pg_xlog files (1000) would contribute to huge
  data
  loss (data changes not synced to the base are not guaranteed). Of-course,
  this is not related to the current situation.  Normally we calculate the
  checkpoint completion time, IO pressure, CPU load and the threat to the
  data loss when we configure checkpoint_segments.

 So you're saying that by using small number of checkpoint segments you
 limit the data loss when the WAL gets corrupted/lost? That's a bit like
 buying a Maseratti and then not going faster than 10mph because you might
 crash at higher speeds ...


No. I am not saying that checkpoint_segments must be lower. I was just
trying to explain the IO over-head on putting high (as high as 1000)
checkpoint segments.  Lower number of checkpoint segments will lead to more
frequent IOs which is not good. Agreed.


 The problem here is that the WAL is usually placed on more reliable drives
 (compared to the data files) or a RAID1 array and as it's just writing
 data sequentially, so the usage pattern is much less likely to cause
 data/drive corruption (compared to data files that need to handle a lot of
 random I/O, etc.).


Agreed.


 So while it possible the WAL might get corrupted, the probability of data
 file corruption is much higher. And the corruption might easily happen
 silently during a checkpoint, so there won't be any WAL segments no matter
 how many of them you keep ...


Agreed. When corruption occurs, it really does not matter how many WAL
segments are kept in pg_xlog.
But, at any point of time if PG needs


 And by using low number of checkpoint segments it actually gets worse,
 because it means more frequent checkpoints - more I/O on the drives -
 more wearout of the drives etc.


Completely agreed. As mentioned above. I choose checkpoint_segments and
checkpoint_timeout once i observe the checkpoint behavior.

If you need to protect yourself against this, you need to keep a WAL
 archive (prefferably on a separate machine) and/or a hot standby for
 failover.


WAL archiving is a different situation where-in you need to backup the
pg_xlog files by enabling archiving.
I was referring to an exclusive situation, where-in pg_xlogs are not
archived and data is not yet been synced to base files (by bgwriter) and
the system crashed, then PG would depend on pg_xlog to recover and reach
the consistent state, if the pg_xlog is also not available, then there
would be a data loss and this depends on how much data is present in
pg_xlog files.

Thanks,
VB


Re: [GENERAL] High checkpoint_segments

2012-02-15 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 4:12 PM, Andres Freund and...@anarazel.de wrote:

 On Wednesday, February 15, 2012 10:38:23 AM Venkat Balaji wrote:
  On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe
 scott.marl...@gmail.comwrote:
   On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji 
 venkat.bal...@verse.in
all of these 1000 files get filled up in less than 5 mins, there are
chances that system will slow down due to high IO and CPU.
   As far as I know there is no data loss issue with a lot of checkpoint
   segments.
  Data loss would be an issue when there is a server crash or pg_xlog crash
  etc. That many number of pg_xlog files (1000) would contribute to huge
 data
  loss (data changes not synced to the base are not guaranteed). Of-course,
  this is not related to the current situation.  Normally we calculate the
  checkpoint completion time, IO pressure, CPU load and the threat to the
  data loss when we configure checkpoint_segments.
 I think you might be misunderstanding something. A high number of
 checkpoint_segments can lead to slower recovery - all those changes need
 to be
 reapplied - but it won't lead to lost data. The data inside the wal will be
 fsynced at appropriate times (commit; background writer; too much written).


Recovery would take time because all the changes in WAL files of pg_xlog
(which is high) must be replayed to reach consistent state. When disaster
strikes and if pg_xlogs are not available and data in WAL is not fsynced
yet, then recovery is not possible and data loss will be huge. It also
depends on how much data is not fsynced.

Thanks,
VB


[GENERAL] High checkpoint_segments

2012-02-14 Thread Jay Levitt
We need to do a few bulk updates as Rails migrations.  We're a typical 
read-mostly web site, so at the moment, our checkpoint settings and WAL are 
all default (3 segments, 5 min, 16MB), and updating a million rows takes 10 
minutes due to all the checkpointing.


We have no replication or hot standbys.  As a consumer-web startup, with no 
SLA, and not a huge database, and if we ever do have to recover from 
downtime it's ok if it takes longer.. is there a reason NOT to always run 
with something like checkpoint_segments = 1000, as long as I leave the 
timeout at 5m?


Jay Levitt

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] High checkpoint_segments

2012-02-14 Thread Venkat Balaji
On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt jay.lev...@gmail.com wrote:

 We need to do a few bulk updates as Rails migrations.  We're a typical
 read-mostly web site, so at the moment, our checkpoint settings and WAL are
 all default (3 segments, 5 min, 16MB), and updating a million rows takes 10
 minutes due to all the checkpointing.

 We have no replication or hot standbys.  As a consumer-web startup, with
 no SLA, and not a huge database, and if we ever do have to recover from
 downtime it's ok if it takes longer.. is there a reason NOT to always run
 with something like checkpoint_segments = 1000, as long as I leave the
 timeout at 5m?


Still checkpoints keep occurring every 5 mins. Anyways
checkpoint_segments=1000 is huge, this implies you are talking about
16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
perspective and data loss perspective. Even in the most unimaginable case
if all of these 1000 files get filled up in less than 5 mins, there are
chances that system will slow down due to high IO and CPU.

You may think of increasing checkpoint_timeout as well, but, some
monitoring and analysis is needed to arrive at a number.

What does pg_stat_bgwriter say about checkpoints ?
Do you have log_checkpoints enabled ?

Thanks
VB


Re: [GENERAL] High checkpoint_segments

2012-02-14 Thread Scott Marlowe
On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji venkat.bal...@verse.in wrote:

 On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt jay.lev...@gmail.com wrote:

 We need to do a few bulk updates as Rails migrations.  We're a typical
 read-mostly web site, so at the moment, our checkpoint settings and WAL are
 all default (3 segments, 5 min, 16MB), and updating a million rows takes 10
 minutes due to all the checkpointing.

 We have no replication or hot standbys.  As a consumer-web startup, with
 no SLA, and not a huge database, and if we ever do have to recover from
 downtime it's ok if it takes longer.. is there a reason NOT to always run
 with something like checkpoint_segments = 1000, as long as I leave the
 timeout at 5m?


 Still checkpoints keep occurring every 5 mins. Anyways
 checkpoint_segments=1000 is huge, this implies you are talking about
 16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O
 perspective and data loss perspective. Even in the most unimaginable case if
 all of these 1000 files get filled up in less than 5 mins, there are chances
 that system will slow down due to high IO and CPU.

As far as I know there is no data loss issue with a lot of checkpoint segments.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general