[HACKERS] wal_buffers = -1

2014-01-17 Thread Magnus Hagander
Is there any real use-case for not setting wal_buffers to -1 these days?

Or should we just remove it and use the -1 behaviour at all times?

IIRC we discussed not keeping it at all when the autotune behavior was
introduced, but said we wanted to keep it just in case. If we're not
ready to remove it, then does that just mean that we need to fix it so we
can?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Thom Brown
On 17 January 2014 13:01, Magnus Hagander mag...@hagander.net wrote:
 Is there any real use-case for not setting wal_buffers to -1 these days?

 Or should we just remove it and use the -1 behaviour at all times?

 IIRC we discussed not keeping it at all when the autotune behavior was
 introduced, but said we wanted to keep it just in case. If we're not ready
 to remove it, then does that just mean that we need to fix it so we can?

Robert Haas reported that setting it to 32MB can yield a considerable
performance benefit:

http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Andres Freund
Hi,

On 2014-01-17 14:01:56 +0100, Magnus Hagander wrote:
 Is there any real use-case for not setting wal_buffers to -1 these days?
 
 Or should we just remove it and use the -1 behaviour at all times?

I have seen improvements by setting it larger than the max -1 one
value. Also, for some workloads (low latency) it can be beneficial to
have a small s_b but still have a larger wal_buffers setting.

So -1 for removing it from me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Magnus Hagander
On Fri, Jan 17, 2014 at 2:07 PM, Thom Brown t...@linux.com wrote:

 On 17 January 2014 13:01, Magnus Hagander mag...@hagander.net wrote:
  Is there any real use-case for not setting wal_buffers to -1 these days?
 
  Or should we just remove it and use the -1 behaviour at all times?
 
  IIRC we discussed not keeping it at all when the autotune behavior was
  introduced, but said we wanted to keep it just in case. If we're not
 ready
  to remove it, then does that just mean that we need to fix it so we can?

 Robert Haas reported that setting it to 32MB can yield a considerable
 performance benefit:


 http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com


In that case, sholdn't the autotuning be changed to not limit it at 16MB?
:)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Thom Brown
On 17 January 2014 13:20, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Jan 17, 2014 at 2:07 PM, Thom Brown t...@linux.com wrote:

 On 17 January 2014 13:01, Magnus Hagander mag...@hagander.net wrote:
  Is there any real use-case for not setting wal_buffers to -1 these days?
 
  Or should we just remove it and use the -1 behaviour at all times?
 
  IIRC we discussed not keeping it at all when the autotune behavior was
  introduced, but said we wanted to keep it just in case. If we're not
  ready
  to remove it, then does that just mean that we need to fix it so we can?

 Robert Haas reported that setting it to 32MB can yield a considerable
 performance benefit:


 http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com


 In that case, sholdn't the autotuning be changed to not limit it at 16MB? :)

Well, that's the question.  Do we have a heuristic sweet-spot that
folk would agree on?

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Robert Haas
On Fri, Jan 17, 2014 at 8:20 AM, Magnus Hagander mag...@hagander.net wrote:
 Robert Haas reported that setting it to 32MB can yield a considerable
 performance benefit:

 http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com

 In that case, sholdn't the autotuning be changed to not limit it at 16MB? :)

I'm in favor of keeping the setting; I think that the auto-tuning has
largely eliminated the pain in this area for the majority of users,
but that doesn't mean we should deny someone who really wants to
squeeze the last drop of performance out of their system the
opportunity to poke at it manually.  I doubt it's the least useful
setting we have.  The test above shows 32MB beating 16MB, but I think
I did other tests where 16MB and 64MB came out the same.

Back when I was working heavily on performance, I did a number of
tests to try to understand what events cause latency spikes.  Many of
those events are related to write-ahead logging.  It turns out that
writing a page from PostgreSQL's WAL buffers to the OS cache is
usually pretty fast, unless the OS thinks we're dirtying data too
quickly and decides to slam on the brakes.  Calling fsync() to get the
data out to disk, though, is very slow.  However, both of those
operations are protected by the same lock (WALWriteLock), so it's
frequently the case that no more WAL buffer space can be freed up by
calling write() because the guy who has the lock is busy waiting for
an fsync().  That sucks, because there's no intrinsic reason why we
can't have one backend doing a write() while another backend is doing
an fsync().  On a related note, there's no real reason why the poor
bastard who writes the WAL record that fills a segment should be
forced to synchronously flush the segment instead of letting it be
done in the background, but right now he is.

I think if we fix these problems, the optimal value for wal_buffers is
likely to change; however, I'm not certain we'll be able to to
auto-tune it perfectly on day one.  Having a setting makes it easier
for people to experiment with different values, and I think that's
good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-30 Thread Robert Haas
On Wed, Aug 29, 2012 at 10:25 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 19 February 2012 05:24, Robert Haas robertmh...@gmail.com wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity: we can't insert more WAL because the buffer we need
 to use still contains WAL that hasn't yet been fsync'd, leading to
 long stalls.  More buffer space ameliorates the problem.

 Incidentally, I wondered if we could further improve group commit
 performance by implementing commit_delay with a WaitLatch call, and
 setting the latch in the event of WAL buffers wraparound (or rather, a
 queued wraparound request - a segment switch needs WALWriteLock, which
 the group commit leader holds for a relatively long time during the
 delay). I'm not really sure how significant a win this might be,
 though. There could be other types of contention, which could be
 considerably more significant. I'll try and take a look at it next
 week.

I have a feeling that one of the big bottlenecks here is that we force
an immediate fsync when we reach the end of a segment.  I think it was
originally done that way to keep the code simple, and it does
accomplish that, but it's not so hot for performance.  More generally,
I think we really need to split WALWriteLock into two locks, one to
protect the write position and the other to protect the flush
position.  I think we're often ending up with a write (which is
usually fast) waiting for a flush (which is often much slower) when in
fact those things ought to be able to happen in parallel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-30 Thread Amit Kapila
On Thursday, August 30, 2012 7:14 PM Robert Haas wrote:
On Wed, Aug 29, 2012 at 10:25 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
 On 19 February 2012 05:24, Robert Haas robertmh...@gmail.com wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity: we can't insert more WAL because the buffer we need
 to use still contains WAL that hasn't yet been fsync'd, leading to
 long stalls.  More buffer space ameliorates the problem.

 Incidentally, I wondered if we could further improve group commit
 performance by implementing commit_delay with a WaitLatch call, and
 setting the latch in the event of WAL buffers wraparound (or rather, a
 queued wraparound request - a segment switch needs WALWriteLock, which
 the group commit leader holds for a relatively long time during the
 delay). I'm not really sure how significant a win this might be,
 though. There could be other types of contention, which could be
 considerably more significant. I'll try and take a look at it next
 week.

 I have a feeling that one of the big bottlenecks here is that we force
 an immediate fsync when we reach the end of a segment.  I think it was
 originally done that way to keep the code simple, and it does
 accomplish that, but it's not so hot for performance.  More generally,
 I think we really need to split WALWriteLock into two locks, one to
 protect the write position and the other to protect the flush
 position.  I think we're often ending up with a write (which is
 usually fast) waiting for a flush (which is often much slower) when in
 fact those things ought to be able to happen in parallel.

  This is really good idea for splitting WALWriteLock into two locks, 
  but in that case do we need separate handling for OPEN_SYNC method where 
  write and flush happens together?

  And more about WAL, do you have any suggestions regarding the idea of
triggering
  WALWriter in case Xlog buffers are nearly full?

With Regards,
Amit Kapila.


With Regards,
Amit Kapila.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-29 Thread Amit Kapila
On Tuesday, August 28, 2012 9:33 PM Bruce Momjian wrote:
On Tue, Aug 28, 2012 at 09:40:33AM +0530, Amit Kapila wrote:
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Bruce Momjian
 
 Added to TODO:
 
 Allow reporting of stalls due to wal_buffer wrap-around
   

 http://archives.postgresql.org/pgsql-hackers/2012-02/msg00826.php 
 
 Isn't this indicates that while writing XLOG, it needs some tuning such
that
 when some thresh hold buffers(2/3) are full, then trigger LOGWriter. 

 I assumed the LOGWriter was already working as fast as it could, but
 couldn't keep up.

 I have doubt that in some cases it might not, for example 

1. Assume 16M Xlog buffers
2. 6M or less than that is filled.
3. Background writer decides how much to flush and starts writing and at the
same time backends start filling
   remaining 10M of the buffers.
4. Background writer goes to sleep after flushing 6M.
5. Backends have filled all 16M, in this case it may so happen that some
backends might need to do I/O.


Some other cases where I think it can be useful to wake LogWriter
Case-1
---
1. Log writer delay is default 200ms or set to some higher value by user.
2. All the configured buffers got filled, backend might need to do I/O.  

Case-2
---
The case-1 scenario can also happen even if user has set wal_buffers = -1
(Auto tuning of wal buffers),
Because it reserves XLog buffers equivalent to one segment file and in high
load if that gets filled, backends might need
to do I/O.


---

 
 On Sun, Feb 19, 2012 at 12:24:12AM -0500, Robert Haas wrote:
  Just for kicks, I ran two 30-minute pgbench tests at scale factor 300
  tonight on Nate Boley's machine, with -n -l -c 32 -j 32.  The
  configurations were identical, except that on one of them, I set
  wal_buffers=64MB.  It seemed to make quite a lot of difference:
  
  wal_buffers not set (thus, 16MB):
  tps = 3162.594605 (including connections establishing)
  
  wal_buffers=64MB:
  tps = 6164.194625 (including connections establishing)
  
  Rest of config: shared_buffers = 8GB, maintenance_work_mem = 1GB,
  synchronous_commit = off, checkpoint_segments = 300,
  checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
  wal_writer_delay = 20ms
  
  I have attached tps scatterplots.  The obvious conclusion appears to
  be that, with only 16MB of wal_buffers, the buffer wraps around with
  some regularity: we can't insert more WAL because the buffer we need
  to use still contains WAL that hasn't yet been fsync'd, leading to
  long stalls.  More buffer space ameliorates the problem.  This is not
  very surprising, when you think about it: it's clear that the peak tps
  rate approaches 18k/s on these tests; right after a checkpoint, every
  update will force a full page write - that is, a WAL record  8kB.  So
  we'll fill up a 16MB WAL segment in about a tenth of a second.  That
  doesn't leave much breathing room.  I think we might want to consider
  adjusting our auto-tuning formula for wal_buffers to allow for a
  higher cap, although this is obviously not enough data to draw any
  firm conclusions.
  
  -- 
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 
 
  
  -- 
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-29 Thread Peter Geoghegan
On 19 February 2012 05:24, Robert Haas robertmh...@gmail.com wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity: we can't insert more WAL because the buffer we need
 to use still contains WAL that hasn't yet been fsync'd, leading to
 long stalls.  More buffer space ameliorates the problem.

Incidentally, I wondered if we could further improve group commit
performance by implementing commit_delay with a WaitLatch call, and
setting the latch in the event of WAL buffers wraparound (or rather, a
queued wraparound request - a segment switch needs WALWriteLock, which
the group commit leader holds for a relatively long time during the
delay). I'm not really sure how significant a win this might be,
though. There could be other types of contention, which could be
considerably more significant. I'll try and take a look at it next
week.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-28 Thread Bruce Momjian
On Tue, Aug 28, 2012 at 09:40:33AM +0530, Amit Kapila wrote:
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Bruce Momjian
 
  Added to TODO:
 
  Allow reporting of stalls due to wal_buffer wrap-around
   
 
 http://archives.postgresql.org/pgsql-hackers/2012-02/msg00826.php 
 
 Isn't this indicates that while writing XLOG, it needs some tuning such that
 when some thresh hold buffers(2/3) are full, then trigger LOGWriter. 

I assumed the LOGWriter was already working as fast as it could, but
couldn't keep up.

---

 
 On Sun, Feb 19, 2012 at 12:24:12AM -0500, Robert Haas wrote:
  Just for kicks, I ran two 30-minute pgbench tests at scale factor 300
  tonight on Nate Boley's machine, with -n -l -c 32 -j 32.  The
  configurations were identical, except that on one of them, I set
  wal_buffers=64MB.  It seemed to make quite a lot of difference:
  
  wal_buffers not set (thus, 16MB):
  tps = 3162.594605 (including connections establishing)
  
  wal_buffers=64MB:
  tps = 6164.194625 (including connections establishing)
  
  Rest of config: shared_buffers = 8GB, maintenance_work_mem = 1GB,
  synchronous_commit = off, checkpoint_segments = 300,
  checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
  wal_writer_delay = 20ms
  
  I have attached tps scatterplots.  The obvious conclusion appears to
  be that, with only 16MB of wal_buffers, the buffer wraps around with
  some regularity: we can't insert more WAL because the buffer we need
  to use still contains WAL that hasn't yet been fsync'd, leading to
  long stalls.  More buffer space ameliorates the problem.  This is not
  very surprising, when you think about it: it's clear that the peak tps
  rate approaches 18k/s on these tests; right after a checkpoint, every
  update will force a full page write - that is, a WAL record  8kB.  So
  we'll fill up a 16MB WAL segment in about a tenth of a second.  That
  doesn't leave much breathing room.  I think we might want to consider
  adjusting our auto-tuning formula for wal_buffers to allow for a
  higher cap, although this is obviously not enough data to draw any
  firm conclusions.
  
  -- 
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 
 
  
  -- 
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-27 Thread Bruce Momjian

Added to TODO:

Allow reporting of stalls due to wal_buffer wrap-around

http://archives.postgresql.org/pgsql-hackers/2012-02/msg00826.php 

---

On Sun, Feb 19, 2012 at 12:24:12AM -0500, Robert Haas wrote:
 Just for kicks, I ran two 30-minute pgbench tests at scale factor 300
 tonight on Nate Boley's machine, with -n -l -c 32 -j 32.  The
 configurations were identical, except that on one of them, I set
 wal_buffers=64MB.  It seemed to make quite a lot of difference:
 
 wal_buffers not set (thus, 16MB):
 tps = 3162.594605 (including connections establishing)
 
 wal_buffers=64MB:
 tps = 6164.194625 (including connections establishing)
 
 Rest of config: shared_buffers = 8GB, maintenance_work_mem = 1GB,
 synchronous_commit = off, checkpoint_segments = 300,
 checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
 wal_writer_delay = 20ms
 
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity: we can't insert more WAL because the buffer we need
 to use still contains WAL that hasn't yet been fsync'd, leading to
 long stalls.  More buffer space ameliorates the problem.  This is not
 very surprising, when you think about it: it's clear that the peak tps
 rate approaches 18k/s on these tests; right after a checkpoint, every
 update will force a full page write - that is, a WAL record  8kB.  So
 we'll fill up a 16MB WAL segment in about a tenth of a second.  That
 doesn't leave much breathing room.  I think we might want to consider
 adjusting our auto-tuning formula for wal_buffers to allow for a
 higher cap, although this is obviously not enough data to draw any
 firm conclusions.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-08-27 Thread Amit Kapila
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Bruce Momjian

 Added to TODO:

   Allow reporting of stalls due to wal_buffer wrap-around


http://archives.postgresql.org/pgsql-hackers/2012-02/msg00826.php 

Isn't this indicates that while writing XLOG, it needs some tuning such that
when some thresh hold buffers(2/3) are full, then trigger LOGWriter. 



---

On Sun, Feb 19, 2012 at 12:24:12AM -0500, Robert Haas wrote:
 Just for kicks, I ran two 30-minute pgbench tests at scale factor 300
 tonight on Nate Boley's machine, with -n -l -c 32 -j 32.  The
 configurations were identical, except that on one of them, I set
 wal_buffers=64MB.  It seemed to make quite a lot of difference:
 
 wal_buffers not set (thus, 16MB):
 tps = 3162.594605 (including connections establishing)
 
 wal_buffers=64MB:
 tps = 6164.194625 (including connections establishing)
 
 Rest of config: shared_buffers = 8GB, maintenance_work_mem = 1GB,
 synchronous_commit = off, checkpoint_segments = 300,
 checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
 wal_writer_delay = 20ms
 
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity: we can't insert more WAL because the buffer we need
 to use still contains WAL that hasn't yet been fsync'd, leading to
 long stalls.  More buffer space ameliorates the problem.  This is not
 very surprising, when you think about it: it's clear that the peak tps
 rate approaches 18k/s on these tests; right after a checkpoint, every
 update will force a full page write - that is, a WAL record  8kB.  So
 we'll fill up a 16MB WAL segment in about a tenth of a second.  That
 doesn't leave much breathing room.  I think we might want to consider
 adjusting our auto-tuning formula for wal_buffers to allow for a
 higher cap, although this is obviously not enough data to draw any
 firm conclusions.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-14 Thread Jeff Janes
On Mon, Mar 12, 2012 at 7:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 11:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

 Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
 when it's set to -1. Thanks to your result, we should increase the max to
 32MB?

 I think that might be a good idea, although I'm not entirely convinced
 that we understand why increasing wal_buffers is helping as much as it
 is.  I stuck an elog() into AdvanceXLInsertBuffer() to complain in the
 case that we were writing buffers while holding the insert lock.
 Then, I reran 30-minute tests 32 clients, one with wal_buffers=16MB
 and the other wal_buffers=32MB.  On the 16MB test, the elog() fired 15
 times in a single second shortly after the start of the test, and then
 9 more times over the rest of the test.  On the 32MB test, the elog()
 fired a total 6 times over the course of the test.  The first test got
 14320 tps, while the second got 15026 tps.  I find that quite
 surprising, because although WAL buffer wraparound is certainly bad
 (on this test, it probably brings the system completely to a halt
 until fsync() finishes) it really shouldn't lock up the system for
 multiple seconds at a time.  And yet that's what it would need to be
 doing to account for the tps discrepancy on this test, considering how
 rarely it occurs.

Where did you put the elog?  If you put it near
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START(), it is probably too
late in the code.

If someone else (like the background wal writer) is blocked on an
fsync, then AdvanceXLInsertBuffer will block on getting the
WALWriteLock.  Once it gets the lock, it will likely find it no longer
needs to do the write (because it was done by the thing that just
blocked and then released it), but at that point the damage has
already been done.  The damage is not that it has to do a write, but
that it had to block (indirectly) on an fsync which it didn't really
care about.

One possibility is that the writer should update
xlogctl-LogwrtResult.Write between finishing the write and starting
the fsync.  That way an AdvanceXLInsertBuffer that arrives during the
fsync could see that the data is needs to overwrite has already been
written, even if not yet fsynced, and that would be enough to allow it
proceed.

That would be ugly, because AdvanceXLInsertBuffer that arrives during
the write itself would check xlogctl and then block until the fsync
finished, with no way to wake up as soon as the write-part finished.
So making it work cleanly would require a redesign of the whole
locking scheme.  Which the Scaling patch is already doing.




 Studying AdvanceXLInsertBuffer() a bit more, I'm wondering if the
 problem isn't so much - or isn't only - that it's expensive to write
 buffers while also holding WALInsertLock.  Maybe it's too expensive
 even to acquire WalWriteLock in the first place - that is, the real
 problem isn't so much the wraparound condition itself, but the expense
 of testing whether a possible wraparound has actually occurred.  A
 quick test suggests that we acquire WALWriteLock here much more often
 than we actually write anything while holding it, and that we get a
 big burst of WALWriteLock acquisitions here immediately after a
 checkpoint.  I don't have any proof that this is what's causing the
 tps drop with smaller wal_buffers, but I think there has to be
 something other than an actual wraparound condition causing problems
 here, because that just doesn't seem to happen frequently enough to be
 an issue.

I think my analysis is pretty much a re-wording of yours, but I'd
emphasize that getting the WALWriteLock is bad not just because they
fight over the lock, but because someone else (probably background wal
writer) is camping out on the lock while doing an fsync.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-14 Thread Robert Haas
On Tue, Mar 13, 2012 at 11:18 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 6:44 PM, Josh Berkus j...@agliodbs.com wrote:
 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 H.   Do you have any ability to test on XFS?

 It seems I do.

 XFS, with fsync = on:
 tps = 14746.687499 (including connections establishing)
 XFS, with fsync = off:
 tps = 25121.876560 (including connections establishing)

 No real dramatic difference there, maybe a bit slower.

 On further thought, it may be that this is just a simple case of too
 many checkpoints.  With fsync=off, we don't have to actually write all
 that dirty data back to disk.  I'm going to try cranking up
 checkpoint_segments and see what happens.

OK, this is bizarre.  I wiped out my XFS filesystem and put ext4 back,
and look at this:

tps = 19105.740878 (including connections establishing)
tps = 19687.674409 (including connections establishing)

That's a jump of nearly a third from before.  I'm not sure what's
different.  Nothing, AFAIK.  I drop and recreate the database after
every test run, so I don't see why this should be so much better,
unless ext4 degrades over time (even though the FS is nearly empty,
and I'm dropping the whole database after each test run).

Then I tried it with checkpoint_segments=3000 rather than 300.

tps = 26750.190469 (including connections establishing)

Hmm, what happens with checkpoint_segments=3000 and fsync=off?

tps = 30395.583366 (including connections establishing)

Hmm, and what if I set checkpoint_segments=300 and fsync=off?

tps = 26029.160919 (including connections establishing)

Not sure what to make of all this, yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-14 Thread Robert Haas
On Wed, Mar 14, 2012 at 3:29 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I think my analysis is pretty much a re-wording of yours, but I'd
 emphasize that getting the WALWriteLock is bad not just because they
 fight over the lock, but because someone else (probably background wal
 writer) is camping out on the lock while doing an fsync.

Yeah, good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Andres Freund
On Tuesday, March 13, 2012 03:26:34 AM Robert Haas wrote:
 Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
 machine.  32 clients, 1800 seconds, scale factor 300, synchronous
 commit off.
That graph makes me cringe because its pretty representative of what I have 
seen in practise. Any chance we can get a ~10s average in there? I find it 
pretty hard to read more than spikey.

Whats the filesystem + options on this?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.
 If it increases the TPS of Nate@16MB, but doesn't change the other 3
 situations much, then that suggests the IO system is driving it.
 Basically moving up to 32MB is partially innoculating against slow
 fsyncs upon log switch on that machine.

I had the idea of running pg_test_fsync on each machine.  Here, for
what it's worth, are the numbers.

Nate Boley's box:

2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 116.604 ops/sec
fdatasync 108.654 ops/sec
fsync  20.234 ops/sec
fsync_writethroughn/a
open_sync  17.979 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync  39.833 ops/sec
fdatasync  58.072 ops/sec
fsync  19.756 ops/sec
fsync_writethroughn/a
open_sync   8.425 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
 1 * 16kB open_sync write  17.408 ops/sec
 2 *  8kB open_sync writes  9.376 ops/sec
 4 *  4kB open_sync writes  4.912 ops/sec
 8 *  2kB open_sync writes  2.477 ops/sec
16 *  1kB open_sync writes  1.244 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close19.818 ops/sec
write, close, fsync19.086 ops/sec

Non-Sync'ed 8kB writes:
write   101176.089 ops/sec

IBM POWER7 server:

2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 167.009 ops/sec
fdatasync 167.091 ops/sec
fsync  47.321 ops/sec
fsync_writethroughn/a
open_sync  53.735 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync  68.891 ops/sec
fdatasync 164.597 ops/sec
fsync  50.334 ops/sec
fsync_writethroughn/a
open_sync  25.240 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
 1 * 16kB open_sync write  53.305 ops/sec
 2 *  8kB open_sync writes 23.863 ops/sec
 4 *  4kB open_sync writes 12.500 ops/sec
 8 *  2kB open_sync writes  4.734 ops/sec
16 *  1kB open_sync writes  3.063 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close48.730 ops/sec
write, close, fsync48.463 ops/sec

Non-Sync'ed 8kB writes:
write   186465.321 ops/sec

I can't rerun any more serious benchmarks on Nate Boley's box right
now due to other activity on the box.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

I reran the 32-client benchmark on the IBM machine with fsync=off and got this:

32MB: tps = 26809.442903 (including connections establishing)
16MB: tps = 26651.320145 (including connections establishing)

That's a speedup of nearly a factor of two, so clearly fsync-related
stalls are a big problem here, even with wal_buffers cranked up
through the ceiling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 3:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

 I reran the 32-client benchmark on the IBM machine with fsync=off and got 
 this:

 32MB: tps = 26809.442903 (including connections establishing)
 16MB: tps = 26651.320145 (including connections establishing)

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

And here's a tps plot with wal_buffers = 16MB, fsync = off.  The
performance still bounces up and down, so there's obviously some other
factor contributing to latency spikes, but equally clearly, needing to
wait for fsyncs makes it a lot worse.  I bet if we could understand
why that happens, we could improve things here a good deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB.no-fsync.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 4:55 AM, Andres Freund and...@anarazel.de wrote:
 On Tuesday, March 13, 2012 03:26:34 AM Robert Haas wrote:
 Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
 machine.  32 clients, 1800 seconds, scale factor 300, synchronous
 commit off.
 That graph makes me cringe because its pretty representative of what I have
 seen in practise. Any chance we can get a ~10s average in there? I find it
 pretty hard to read more than spikey.

I tried that but don't find it an improvement; however I'm attaching
an example for your consideration.

 Whats the filesystem + options on this?

ext4, default options.  Fedora 16, 3.2.6-3.fc16.ppc64.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB-10s.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Josh Berkus

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

H.   Do you have any ability to test on XFS?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Fujii Masao
On Wed, Mar 14, 2012 at 7:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 3:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

 I reran the 32-client benchmark on the IBM machine with fsync=off and got 
 this:

 32MB: tps = 26809.442903 (including connections establishing)
 16MB: tps = 26651.320145 (including connections establishing)

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 And here's a tps plot with wal_buffers = 16MB, fsync = off.  The
 performance still bounces up and down, so there's obviously some other
 factor contributing to latency spikes

Initialization of WAL file? Do the latency spikes disappear if you start
benchmark after you prepare lots of the recycled WAL files?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 10:02 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 14, 2012 at 7:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 3:48 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.

 I reran the 32-client benchmark on the IBM machine with fsync=off and got 
 this:

 32MB: tps = 26809.442903 (including connections establishing)
 16MB: tps = 26651.320145 (including connections establishing)

 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 And here's a tps plot with wal_buffers = 16MB, fsync = off.  The
 performance still bounces up and down, so there's obviously some other
 factor contributing to latency spikes

 Initialization of WAL file? Do the latency spikes disappear if you start
 benchmark after you prepare lots of the recycled WAL files?

The latency spikes seem to correspond to checkpoints, so I don't think
that's it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-13 Thread Robert Haas
On Tue, Mar 13, 2012 at 6:44 PM, Josh Berkus j...@agliodbs.com wrote:
 That's a speedup of nearly a factor of two, so clearly fsync-related
 stalls are a big problem here, even with wal_buffers cranked up
 through the ceiling.

 H.   Do you have any ability to test on XFS?

It seems I do.

XFS, with fsync = on:
tps = 14746.687499 (including connections establishing)
XFS, with fsync = off:
tps = 25121.876560 (including connections establishing)

No real dramatic difference there, maybe a bit slower.

On further thought, it may be that this is just a simple case of too
many checkpoints.  With fsync=off, we don't have to actually write all
that dirty data back to disk.  I'm going to try cranking up
checkpoint_segments and see what happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas
On Sun, Mar 11, 2012 at 11:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

 Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
 when it's set to -1. Thanks to your result, we should increase the max to
 32MB?

I think that might be a good idea, although I'm not entirely convinced
that we understand why increasing wal_buffers is helping as much as it
is.  I stuck an elog() into AdvanceXLInsertBuffer() to complain in the
case that we were writing buffers while holding the insert lock.
Then, I reran 30-minute tests 32 clients, one with wal_buffers=16MB
and the other wal_buffers=32MB.  On the 16MB test, the elog() fired 15
times in a single second shortly after the start of the test, and then
9 more times over the rest of the test.  On the 32MB test, the elog()
fired a total 6 times over the course of the test.  The first test got
14320 tps, while the second got 15026 tps.  I find that quite
surprising, because although WAL buffer wraparound is certainly bad
(on this test, it probably brings the system completely to a halt
until fsync() finishes) it really shouldn't lock up the system for
multiple seconds at a time.  And yet that's what it would need to be
doing to account for the tps discrepancy on this test, considering how
rarely it occurs.

Studying AdvanceXLInsertBuffer() a bit more, I'm wondering if the
problem isn't so much - or isn't only - that it's expensive to write
buffers while also holding WALInsertLock.  Maybe it's too expensive
even to acquire WalWriteLock in the first place - that is, the real
problem isn't so much the wraparound condition itself, but the expense
of testing whether a possible wraparound has actually occurred.  A
quick test suggests that we acquire WALWriteLock here much more often
than we actually write anything while holding it, and that we get a
big burst of WALWriteLock acquisitions here immediately after a
checkpoint.  I don't have any proof that this is what's causing the
tps drop with smaller wal_buffers, but I think there has to be
something other than an actual wraparound condition causing problems
here, because that just doesn't seem to happen frequently enough to be
an issue.

Anyway, maybe none of that matters at the moment.  Perhaps it's enough
to know that wal_buffers16MB can help, and just bump up the maximum
auto-tuned value a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Jeff Janes
On Sat, Mar 10, 2012 at 7:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

On Nate Boley's machine, the difference was ~100% increase rather than
~10%.  Do you think the difference is in the CPU architecture, or the
IO subsystem?

Also, do you have the latency numbers?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas
On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Nate Boley's machine, the difference was ~100% increase rather than
 ~10%.

Oh, right.  I had forgotten how dramatic the changes were in those
test runs.  I guess I should be happy that the absolute numbers on
this machine were as high as they were.  This machine seems to be
beating that one on every metric.

 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

That is an excellent question.  I tried looking at vmstat output, but
a funny thing kept happening: periodically, the iowait column would
show a gigantic negative number instead of a number between 0 and 100.
 This makes me a little chary of believing any of it.  Even if I did,
I'm not sure that would fully answer the question.  So I guess the
short answer is that I don't know, and I'm not even sure how I might
go about figuring it out.  Any ideas?

 Also, do you have the latency numbers?

Not at the moment, but I'll generate them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Jeff Janes
On Mon, Mar 12, 2012 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Nate Boley's machine, the difference was ~100% increase rather than
 ~10%.

 Oh, right.  I had forgotten how dramatic the changes were in those
 test runs.  I guess I should be happy that the absolute numbers on
 this machine were as high as they were.  This machine seems to be
 beating that one on every metric.

 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

 That is an excellent question.  I tried looking at vmstat output, but
 a funny thing kept happening: periodically, the iowait column would
 show a gigantic negative number instead of a number between 0 and 100.

On which machine was that happening?

  This makes me a little chary of believing any of it.  Even if I did,
 I'm not sure that would fully answer the question.  So I guess the
 short answer is that I don't know, and I'm not even sure how I might
 go about figuring it out.  Any ideas?

Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
machines) with fsync=off (as well as synchronous_commit=off still)
might help clarify things.
If it increases the TPS of Nate@16MB, but doesn't change the other 3
situations much, then that suggests the IO system is driving it.
Basically moving up to 32MB is partially innoculating against slow
fsyncs upon log switch on that machine.

Does the POWER7 have a nonvolatile cache?  What happened with
synchronous_commit=on?

Also, since all data fits in shared_buffers, making
checkpoint_segments and checkpoint_timeout be larger than the
benchmark period should remove the only other source of writing from
the system.  With no checkpoints, no evictions, and no fysncs, it is
unlikely for the remaining IO to be the bottleneck.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-12 Thread Robert Haas
On Mon, Mar 12, 2012 at 4:45 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Do you think the difference is in the CPU architecture, or the
 IO subsystem?

 That is an excellent question.  I tried looking at vmstat output, but
 a funny thing kept happening: periodically, the iowait column would
 show a gigantic negative number instead of a number between 0 and 100.

 On which machine was that happening?

The IBM server.

  This makes me a little chary of believing any of it.  Even if I did,
 I'm not sure that would fully answer the question.  So I guess the
 short answer is that I don't know, and I'm not even sure how I might
 go about figuring it out.  Any ideas?

 Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
 machines) with fsync=off (as well as synchronous_commit=off still)
 might help clarify things.
 If it increases the TPS of Nate@16MB, but doesn't change the other 3
 situations much, then that suggests the IO system is driving it.
 Basically moving up to 32MB is partially innoculating against slow
 fsyncs upon log switch on that machine.

Mmm, yeah.  Although, I think it might have been 64MB rather than 32MB
that I tested on that machine.

 Does the POWER7 have a nonvolatile cache?  What happened with
 synchronous_commit=on?

Haven't tried that yet.

 Also, since all data fits in shared_buffers, making
 checkpoint_segments and checkpoint_timeout be larger than the
 benchmark period should remove the only other source of writing from
 the system.  With no checkpoints, no evictions, and no fysncs, it is
 unlikely for the remaining IO to be the bottleneck.

Another thing to test.

Meanwhile, here are some TPS graphs at 16MB and 32MB on the IBM POWER7
machine.  32 clients, 1800 seconds, scale factor 300, synchronous
commit off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-16MB.pngattachment: tps-32MB.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers, redux

2012-03-11 Thread Fujii Masao
On Sun, Mar 11, 2012 at 12:55 PM, Robert Haas robertmh...@gmail.com wrote:
 I've finally been able to run some more tests of the effect of
 adjusting wal_buffers to values higher than 16MB.  I ran the test on
 the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
 configuration settings:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

 I ran three 30-minute tests at scale factor 300 with wal_buffers set
 at various values from 16MB up to 160MB, in multiples of 16MB, using
 pgbench with 32 clients and 32 threads in each case.  The short
 version is that 32MB seems to be significantly better than 16MB, by
 about 1000 tps, and after that it gets murky; full results are below.

Currently the max of wal_buffers is 16MB (i.e., the size of one WAL file)
when it's set to -1. Thanks to your result, we should increase the max to
32MB?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] wal_buffers, redux

2012-03-10 Thread Robert Haas
I've finally been able to run some more tests of the effect of
adjusting wal_buffers to values higher than 16MB.  I ran the test on
the 16 core (x 4 hw threads/core) IBM POWER7 machine, with my usual
configuration settings:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms

I ran three 30-minute tests at scale factor 300 with wal_buffers set
at various values from 16MB up to 160MB, in multiples of 16MB, using
pgbench with 32 clients and 32 threads in each case.  The short
version is that 32MB seems to be significantly better than 16MB, by
about 1000 tps, and after that it gets murky; full results are below.

16MB tps = 13069.420658 (including connections establishing)
16MB tps = 14370.293228 (including connections establishing)
16MB tps = 13787.219680 (including connections establishing)
32MB tps = 14916.068968 (including connections establishing)
32MB tps = 14746.448728 (including connections establishing)
32MB tps = 15110.676467 (including connections establishing)
48MB tps = 15111.981870 (including connections establishing)
48MB tps = 12824.628192 (including connections establishing)
48MB tps = 15090.280081 (including connections establishing)
64MB tps = 15382.740815 (including connections establishing)
64MB tps = 12367.942312 (including connections establishing)
64MB tps = 15195.405382 (including connections establishing)
80MB tps = 15346.080326 (including connections establishing)
80MB tps = 12791.192216 (including connections establishing)
80MB tps = 14780.804054 (including connections establishing)
96MB tps = 15476.229392 (including connections establishing)
96MB tps = 15426.012162 (including connections establishing)
96MB tps = 15548.671849 (including connections establishing)
112MB tps = 15400.669675 (including connections establishing)
112MB tps = 15676.416378 (including connections establishing)
112MB tps = 15016.651083 (including connections establishing)
128MB tps = 15811.463522 (including connections establishing)
128MB tps = 15590.343669 (including connections establishing)
128MB tps = 15256.867665 (including connections establishing)
144MB tps = 15842.131696 (including connections establishing)
144MB tps = 15669.880640 (including connections establishing)
144MB tps = 15753.460908 (including connections establishing)
160MB tps = 15658.726637 (including connections establishing)
160MB tps = 15599.600752 (including connections establishing)
160MB tps = 15311.198480 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-20 Thread Simon Riggs
On Mon, Feb 20, 2012 at 4:10 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Feb 20, 2012 at 3:08 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
 eu...@timbira.com wrote:
 On 19-02-2012 02:24, Robert Haas wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity

 Isn't it useful to print some messages on the log when we have wrap 
 around?
 In this case, we have an idea that wal_buffers needs to be increased.

 I was thinking about that.  I think that what might be more useful
 than a log message is a counter somewhere in shared memory.  Logging
 imposes a lot of overhead, which is exactly what we don't want here,
 and the volume might be quite high on a system that is bumping up
 against this problem.  Of course then the question is... how would we
 expose the counter value?

 There is no existing statistics view suitable to include such information.
 What about defining pg_stat_xlog or something?

Perhaps pg_stat_perf so we don't need to find a new home every time.

Thinking about it, I think renaming pg_stat_bgwriter would make more sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-20 Thread Robert Haas
On Mon, Feb 20, 2012 at 3:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
 There is no existing statistics view suitable to include such information.
 What about defining pg_stat_xlog or something?

 Perhaps pg_stat_perf so we don't need to find a new home every time.

 Thinking about it, I think renaming pg_stat_bgwriter would make more sense.

When we created pg_stat_reset_shared(text), we seemed to be
contemplating the idea of multiple sets of shared counters identified
by names -- bgwriter for the background writer, and maybe other things
for other subsystems.  So we'd have to think about how to adjust that.
 I do agree with you that it seems a shame to invent a whole new view
for one counter...

Another thought is that I'm not sure it makes sense to run this
through the stats system at all.  We could regard it as a shared
memory counter protected by one of the LWLocks involved, which would
probably be quite a bit cheaper - just one machine instruction to
increment it at need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-19 Thread Euler Taveira de Oliveira
On 19-02-2012 02:24, Robert Haas wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity

Isn't it useful to print some messages on the log when we have wrap around?
In this case, we have an idea that wal_buffers needs to be increased.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-19 Thread Robert Haas
On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 On 19-02-2012 02:24, Robert Haas wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity

 Isn't it useful to print some messages on the log when we have wrap around?
 In this case, we have an idea that wal_buffers needs to be increased.

I was thinking about that.  I think that what might be more useful
than a log message is a counter somewhere in shared memory.  Logging
imposes a lot of overhead, which is exactly what we don't want here,
and the volume might be quite high on a system that is bumping up
against this problem.  Of course then the question is... how would we
expose the counter value?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
 eu...@timbira.com wrote:
 Isn't it useful to print some messages on the log when we have wrap around?
 In this case, we have an idea that wal_buffers needs to be increased.

 I was thinking about that.  I think that what might be more useful
 than a log message is a counter somewhere in shared memory.  Logging
 imposes a lot of overhead, which is exactly what we don't want here,
 and the volume might be quite high on a system that is bumping up
 against this problem.  Of course then the question is... how would we
 expose the counter value?

Why do you need a counter, other than the current LSN?  Surely the
number of WAL buffer ring cycles can be deduced directly from that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-19 Thread Simon Riggs
On Sun, Feb 19, 2012 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
 eu...@timbira.com wrote:
 Isn't it useful to print some messages on the log when we have wrap 
 around?
 In this case, we have an idea that wal_buffers needs to be increased.

 I was thinking about that.  I think that what might be more useful
 than a log message is a counter somewhere in shared memory.  Logging
 imposes a lot of overhead, which is exactly what we don't want here,
 and the volume might be quite high on a system that is bumping up
 against this problem.  Of course then the question is... how would we
 expose the counter value?

 Why do you need a counter, other than the current LSN?  Surely the
 number of WAL buffer ring cycles can be deduced directly from that.

The problem isn't how many times its cycled, the issue is whether
there was a wait induced by needing to flush wal buffers because of
too many writes. You can't count those waits in the way you suggest,
though you can calculate an upper limit on them, but that's not very
useful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-19 Thread Fujii Masao
On Mon, Feb 20, 2012 at 3:08 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
 eu...@timbira.com wrote:
 On 19-02-2012 02:24, Robert Haas wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity

 Isn't it useful to print some messages on the log when we have wrap around?
 In this case, we have an idea that wal_buffers needs to be increased.

 I was thinking about that.  I think that what might be more useful
 than a log message is a counter somewhere in shared memory.  Logging
 imposes a lot of overhead, which is exactly what we don't want here,
 and the volume might be quite high on a system that is bumping up
 against this problem.  Of course then the question is... how would we
 expose the counter value?

There is no existing statistics view suitable to include such information.
What about defining pg_stat_xlog or something?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers

2012-02-19 Thread Greg Smith

On 02/19/2012 12:24 AM, Robert Haas wrote:

I think we might want to consider
adjusting our auto-tuning formula for wal_buffers to allow for a
higher cap, although this is obviously not enough data to draw any
firm conclusions.


That's an easy enough idea to throw into my testing queue.  The 16MB 
auto-tuning upper bound was just the easiest number to suggest that was 
obviously useful and unlikely to be wasteful.  One of the reasons 
wal_buffers remains a user-visible parameter was that no one every 
really did an analysis at what its useful upper bound was--and that 
number might move up as other bottlenecks are smashed too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] wal_buffers

2012-02-18 Thread Robert Haas
Just for kicks, I ran two 30-minute pgbench tests at scale factor 300
tonight on Nate Boley's machine, with -n -l -c 32 -j 32.  The
configurations were identical, except that on one of them, I set
wal_buffers=64MB.  It seemed to make quite a lot of difference:

wal_buffers not set (thus, 16MB):
tps = 3162.594605 (including connections establishing)

wal_buffers=64MB:
tps = 6164.194625 (including connections establishing)

Rest of config: shared_buffers = 8GB, maintenance_work_mem = 1GB,
synchronous_commit = off, checkpoint_segments = 300,
checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
wal_writer_delay = 20ms

I have attached tps scatterplots.  The obvious conclusion appears to
be that, with only 16MB of wal_buffers, the buffer wraps around with
some regularity: we can't insert more WAL because the buffer we need
to use still contains WAL that hasn't yet been fsync'd, leading to
long stalls.  More buffer space ameliorates the problem.  This is not
very surprising, when you think about it: it's clear that the peak tps
rate approaches 18k/s on these tests; right after a checkpoint, every
update will force a full page write - that is, a WAL record  8kB.  So
we'll fill up a 16MB WAL segment in about a tenth of a second.  That
doesn't leave much breathing room.  I think we might want to consider
adjusting our auto-tuning formula for wal_buffers to allow for a
higher cap, although this is obviously not enough data to draw any
firm conclusions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-master-64mb.pngattachment: tps-master.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Kevin Grittner
Robert Haas  wrote:
 On Sun, Apr 3, 2011 at 1:26 PM, Tom Lane  wrote:
 
 It would probably take less than a day to flesh out this idea and
 make it happen, but it does seem like a rather large change for
 late alpha.
 
 what we're trying to avoid is committing new stuff that may require
 additional cleanup, not cleaning up the stuff we already did
 commit. Once we get to beta I'll be less enthusiastic about making
 changes like this
 
+1 for fixing it, with full agreement with Robert's project
management perspective on the issue.
 
Having worked in this area a bit I definitely see the need in
general, and for auto-tuning we pretty much have to do this to get it
right.  I think we should be edging into more auto-tuning
capabilities as we figure them out, making this all the more
important.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Tom Lane
I wrote:
 IMO the real problem is essentially that GUC assign hooks have two
 functions, checking and canonicalization of the value-to-be-stored
 versus executing secondary actions when an assignment is made; and
 there's no way to get at just the first one.  So we cannot canonicalize
 the value first and then see if it's equal to the current setting.
 I think the only real fix is to change the API for assign hooks.  This
 is a bit annoying but it's not like we haven't ever done that before.
 I'm thinking about splitting assign hooks into two functions, along the
 lines of

   bool check_hook (datatype *new_value, GucSource source)
   bool assign_hook (datatype new_value, GucSource source)

After perusing the existing assign_hook functions, I have some further
thoughts about this proposal.

* Many of the existing assign hooks do a nontrivial amount of
computation (such as catalog lookups) to derive internal values from the
presented string; examples include assign_temp_tablespaces and
assign_timezone.  A naive implementation of the above design would
require the assign_hook to repeat this computation after the check_hook
already did it, which of course is undesirable.

* Assign hooks that do catalog lookups need special pushups to avoid
having to do such lookups while restoring a previous GUC setting during
transaction abort (since lookups can't safely be performed in an aborted
transaction).  Up to now we've used ad-hoc techniques for each such
variable, as seen for instance in assign_session_authorization.  The
usual idea is to modify the original string to store additional data,
which then requires a custom show_hook to ensure only the original part
of the string is shown to users.  The messiest aspect of this is that
it must be possible to reliably tell a modified string from original
user input.

I think that we can avoid the first problem and clean up the second
problem if we do this:

1. Code guc.c so that the check_hook is only applied to original user
input.  When restoring a previous setting during abort (which
necessarily will have been passed through the check_hook at an earlier
time), only call the assign_hook.

2. Guarantee that the string pointer passed to the assign_hook is
exactly what was returned by the check_hook, ie, guc.c is not allowed
to duplicate or copy that string.

Given these rules, a check_hook and assign_hook could cooperate to store
additional data in what guc.c thinks is just a pointer to a string
value, ie, there can be more data after the terminating \0.  The
assign_hook could work off just this additional data without ever doing
a catalog lookup.  No special show_hook is needed.

Of course this only works for string GUC variables, but I'm finding it
hard to visualize a case where a bool, int, float, or enum GUC could
need a catalog lookup to interpret.  We could possibly legislate that
all of these are separately malloc'd to allow the same type of trick to
be applied across the board, but I think that's overkill.  We can just
tell people they must use a string GUC if they need hidden data.

This technique would need documentation of course, but at least it
*would* be documented rather than ad-hoc for each variable.

Another variant would be to allow the check_hook to pass back a separate
void * value that could be passed on to the assign_hook, containing
any necessary derived data.  This is logically a bit cleaner, and would
work for all types of GUC variables; but it would make things messier in
guc.c since there would be an additional value to pass around.  I'm not
convinced it's worth that, but could be talked into it if anyone feels
strongly about it.

Another thing I was reminded of while perusing the code is the comment
for GUC_complaint_elevel:

 * At some point it'd be nice to replace this with a mechanism that allows
 * the custom message to become the DETAIL line of guc.c's generic message.

The reason we invented GUC_complaint_elevel in the first place was to
avoid a change in the signatures of assign hooks.  If we are making such
a change, now would be the time to fix it, because we're never gonna fix
it otherwise.  I see a few ways we could do it:

1. Add a char **errdetail parameter to assign_hooks, which guc.c would
initialize to NULL before call.  If the hook stores a non-null pointer
there, guc.c would include that string as errdetail.  This is the least
effort from guc.c's viewpoint, but will be relatively painful to use
from the hook's standpoint, because it'd generally have to palloc some
space, or maybe even set up a StringInfo buffer to contain the generated
message.

2. Add a char *errdetail parameter to assign_hooks, which points at
a local-variable buffer in the calling routine, of a well-known size
(think GUC_ERRDETAIL_BUFSIZE macro in guc.h).  Then hooks do something
like
snprintf(errdetail, GUC_ERRDETAIL_BUFSIZE, _(format), ...);
to return an error detail string.

3. Create a function in guc.c for hooks to call, along the 

Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Robert Haas
On Mon, Apr 4, 2011 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 IMO the real problem is essentially that GUC assign hooks have two
 functions, checking and canonicalization of the value-to-be-stored
 versus executing secondary actions when an assignment is made; and
 there's no way to get at just the first one.  So we cannot canonicalize
 the value first and then see if it's equal to the current setting.
 I think the only real fix is to change the API for assign hooks.  This
 is a bit annoying but it's not like we haven't ever done that before.
 I'm thinking about splitting assign hooks into two functions, along the
 lines of

       bool check_hook (datatype *new_value, GucSource source)
       bool assign_hook (datatype new_value, GucSource source)

 After perusing the existing assign_hook functions, I have some further
 thoughts about this proposal.

 * Many of the existing assign hooks do a nontrivial amount of
 computation (such as catalog lookups) to derive internal values from the
 presented string; examples include assign_temp_tablespaces and
 assign_timezone.  A naive implementation of the above design would
 require the assign_hook to repeat this computation after the check_hook
 already did it, which of course is undesirable.

 * Assign hooks that do catalog lookups need special pushups to avoid
 having to do such lookups while restoring a previous GUC setting during
 transaction abort (since lookups can't safely be performed in an aborted
 transaction).  Up to now we've used ad-hoc techniques for each such
 variable, as seen for instance in assign_session_authorization.  The
 usual idea is to modify the original string to store additional data,
 which then requires a custom show_hook to ensure only the original part
 of the string is shown to users.  The messiest aspect of this is that
 it must be possible to reliably tell a modified string from original
 user input.

 I think that we can avoid the first problem and clean up the second
 problem if we do this:

 1. Code guc.c so that the check_hook is only applied to original user
 input.  When restoring a previous setting during abort (which
 necessarily will have been passed through the check_hook at an earlier
 time), only call the assign_hook.

 2. Guarantee that the string pointer passed to the assign_hook is
 exactly what was returned by the check_hook, ie, guc.c is not allowed
 to duplicate or copy that string.

 Given these rules, a check_hook and assign_hook could cooperate to store
 additional data in what guc.c thinks is just a pointer to a string
 value, ie, there can be more data after the terminating \0.  The
 assign_hook could work off just this additional data without ever doing
 a catalog lookup.  No special show_hook is needed.

The only thing this proposal has to recommend it is that the current
coding is even worse.

 Of course this only works for string GUC variables, but I'm finding it
 hard to visualize a case where a bool, int, float, or enum GUC could
 need a catalog lookup to interpret.  We could possibly legislate that
 all of these are separately malloc'd to allow the same type of trick to
 be applied across the board, but I think that's overkill.  We can just
 tell people they must use a string GUC if they need hidden data.

 This technique would need documentation of course, but at least it
 *would* be documented rather than ad-hoc for each variable.

 Another variant would be to allow the check_hook to pass back a separate
 void * value that could be passed on to the assign_hook, containing
 any necessary derived data.  This is logically a bit cleaner, and would
 work for all types of GUC variables; but it would make things messier in
 guc.c since there would be an additional value to pass around.  I'm not
 convinced it's worth that, but could be talked into it if anyone feels
 strongly about it.

 Another thing I was reminded of while perusing the code is the comment
 for GUC_complaint_elevel:

  * At some point it'd be nice to replace this with a mechanism that allows
  * the custom message to become the DETAIL line of guc.c's generic message.

 The reason we invented GUC_complaint_elevel in the first place was to
 avoid a change in the signatures of assign hooks.  If we are making such
 a change, now would be the time to fix it, because we're never gonna fix
 it otherwise.  I see a few ways we could do it:

 1. Add a char **errdetail parameter to assign_hooks, which guc.c would
 initialize to NULL before call.  If the hook stores a non-null pointer
 there, guc.c would include that string as errdetail.  This is the least
 effort from guc.c's viewpoint, but will be relatively painful to use
 from the hook's standpoint, because it'd generally have to palloc some
 space, or maybe even set up a StringInfo buffer to contain the generated
 message.

 2. Add a char *errdetail parameter to assign_hooks, which points at
 a local-variable buffer in the calling routine, of a well-known size
 (think 

Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 4, 2011 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given these rules, a check_hook and assign_hook could cooperate to store
 additional data in what guc.c thinks is just a pointer to a string
 value, ie, there can be more data after the terminating \0.  The
 assign_hook could work off just this additional data without ever doing
 a catalog lookup.  No special show_hook is needed.

 The only thing this proposal has to recommend it is that the current
 coding is even worse.

Well, if you don't like that, do you like this one?

 Another variant would be to allow the check_hook to pass back a separate
 void * value that could be passed on to the assign_hook, containing
 any necessary derived data.  This is logically a bit cleaner, and would
 work for all types of GUC variables; but it would make things messier in
 guc.c since there would be an additional value to pass around.  I'm not
 convinced it's worth that, but could be talked into it if anyone feels
 strongly about it.

If not, what do you suggest?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Robert Haas
On Mon, Apr 4, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 4, 2011 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given these rules, a check_hook and assign_hook could cooperate to store
 additional data in what guc.c thinks is just a pointer to a string
 value, ie, there can be more data after the terminating \0.  The
 assign_hook could work off just this additional data without ever doing
 a catalog lookup.  No special show_hook is needed.

 The only thing this proposal has to recommend it is that the current
 coding is even worse.

 Well, if you don't like that, do you like this one?

To be clear, it's certainly an improvement over what we have now.

 Another variant would be to allow the check_hook to pass back a separate
 void * value that could be passed on to the assign_hook, containing
 any necessary derived data.  This is logically a bit cleaner, and would
 work for all types of GUC variables; but it would make things messier in
 guc.c since there would be an additional value to pass around.  I'm not
 convinced it's worth that, but could be talked into it if anyone feels
 strongly about it.

I haven't really got the mental energy to think through all of this
right now in detail, but I think that might be better.  I think
there's more kludgery here than we're going to fix in one pass, so as
long as we're making improvements, I'm happy.  Is there any case for
using a Datum rather than a void * so people can pack a short quantity
in directly without allocating memory, or are we expecting this to
always be (say) a struct pointer?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 4, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another variant would be to allow the check_hook to pass back a separate
 void * value that could be passed on to the assign_hook, containing
 any necessary derived data.  This is logically a bit cleaner, and would
 work for all types of GUC variables; but it would make things messier in
 guc.c since there would be an additional value to pass around.  I'm not
 convinced it's worth that, but could be talked into it if anyone feels
 strongly about it.

 I haven't really got the mental energy to think through all of this
 right now in detail, but I think that might be better.  I think
 there's more kludgery here than we're going to fix in one pass, so as
 long as we're making improvements, I'm happy.  Is there any case for
 using a Datum rather than a void * so people can pack a short quantity
 in directly without allocating memory, or are we expecting this to
 always be (say) a struct pointer?

Well, I was intending to insist that the void* parameter point to a
single malloc'd block, so that guc.c could release it when the value was
no longer of interest by doing free().  If we don't say that, then we
are going to need a free hook for those objects, which is surely way
more notational overhead than is likely to be repaid for the occasional
cases where a single OID or whatever would be sufficient info.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Robert Haas
On Mon, Apr 4, 2011 at 3:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 4, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another variant would be to allow the check_hook to pass back a separate
 void * value that could be passed on to the assign_hook, containing
 any necessary derived data.  This is logically a bit cleaner, and would
 work for all types of GUC variables; but it would make things messier in
 guc.c since there would be an additional value to pass around.  I'm not
 convinced it's worth that, but could be talked into it if anyone feels
 strongly about it.

 I haven't really got the mental energy to think through all of this
 right now in detail, but I think that might be better.  I think
 there's more kludgery here than we're going to fix in one pass, so as
 long as we're making improvements, I'm happy.  Is there any case for
 using a Datum rather than a void * so people can pack a short quantity
 in directly without allocating memory, or are we expecting this to
 always be (say) a struct pointer?

 Well, I was intending to insist that the void* parameter point to a
 single malloc'd block, so that guc.c could release it when the value was
 no longer of interest by doing free().  If we don't say that, then we
 are going to need a free hook for those objects, which is surely way
 more notational overhead than is likely to be repaid for the occasional
 cases where a single OID or whatever would be sufficient info.

OK.  Please comment the crap out of whatever you do, or maybe even add
a README.  This stuff is just a bit arcane, and guideposts help a lot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 OK.  Please comment the crap out of whatever you do, or maybe even add
 a README.  This stuff is just a bit arcane, and guideposts help a lot.

We already have a README for that ;-).  PFA, a patch to 
src/backend/utils/misc/README describing the proposed revised API.
If nobody has any objections, I'll get on with making this happen.

regards, tom lane

diff --git a/src/backend/utils/misc/README b/src/backend/utils/misc/README
index 881862a30b182a94cd71c6ed1a98dd6e6d5e51e0..221f595f8b72eb1cf48d318beb4aa8d3cb2b3411 100644
*** a/src/backend/utils/misc/README
--- b/src/backend/utils/misc/README
*** determining which setting is used.
*** 12,68 
  Per-Variable Hooks
  --
  
! Each variable known to GUC can optionally have an assign_hook and/or
! a show_hook to provide customized behavior.  Assign hooks are used to
! perform validity checking on variable values (above and beyond what
! GUC can do).  They are also used to update any derived state that needs
! to change when a GUC variable is set.  Show hooks are used to modify
! the default SHOW display for a variable.
  
  If an assign_hook is provided, it points to a function of the signature
! 	bool assign_hook(newvalue, bool doit, GucSource source)
! where the type of newvalue matches the kind of variable.  This function
! is called immediately before actually setting the variable's value (so it
! can look at the actual variable to determine the old value).  If the
! function returns true then the assignment is completed; if it returns
! false then newvalue is considered invalid and the assignment is not
! performed.  If doit is false then the function should simply check
! validity of newvalue and not change any derived state.  The source parameter
! indicates where the new value came from.  If it is = PGC_S_INTERACTIVE,
! then we are performing an interactive assignment (e.g., a SET command), and
! ereport(ERROR) is safe to do.  But when source  PGC_S_INTERACTIVE, we are
! reading a non-interactive option source, such as postgresql.conf.  In this
! case the assign_hook should *not* ereport but should just return false if it
! doesn't like the newvalue.
  
! If an assign_hook returns false then guc.c will report a generic invalid
! value for option FOO error message.  If you feel the need to provide a more
! specific error message, ereport() it using GUC_complaint_elevel(source)
! as the error level.  Note that this might return either ERROR or a lower level
! such as LOG, so the ereport call might or might not return.  If it does
! return, return false out of the assign_hook.
  
! For string variables, the signature for assign hooks is a bit different:
! 	const char *assign_hook(const char *newvalue,
! bool doit,
! GucSource source)
! The meanings of the parameters are the same as for the other types of GUC
! variables, but the return value is handled differently:
! 	NULL --- assignment fails (like returning false for other datatypes)
! 	newvalue --- assignment succeeds, assign the newvalue as-is
! 	malloc'd (not palloc'd!!!) string --- assign that value instead
! The third choice is allowed in case the assign_hook wants to return a
! canonical version of the new value.  For example, the assign_hook for
! datestyle always returns a string that includes both output and input
! datestyle options, although the input might have specified only one.
  
- Note that a string variable's assign_hook will NEVER be called with a NULL
- value for newvalue, since there would be no way to distinguish success
- and failure returns.  If the boot_val or reset_val for a string variable
- is NULL, it will just be assigned without calling the assign_hook.
- Therefore, a NULL boot_val should never be used in combination with an
- assign_hook that has side-effects, as the side-effects wouldn't happen
- during a RESET that re-institutes the boot-time setting.
  
  If a show_hook is provided, it points to a function of the signature
  	const char *show_hook(void)
--- 12,108 
  Per-Variable Hooks
  --
  
! Each variable known to GUC can optionally have a check_hook, an
! assign_hook, and/or a show_hook to provide customized behavior.
! Check hooks are used to perform validity checking on variable values
! (above and beyond what GUC can do), to compute derived settings when
! nontrivial work is needed to do that, and optionally to canonicalize
! user-supplied values.  Assign hooks are used to update any derived state
! that needs to change when a GUC variable is set.  Show hooks are used to
! modify the default SHOW display for a variable.
! 
! 
! If a check_hook is provided, it points to a function of the signature
! 	bool check_hook(datatype *newvalue, void **extra, GucSource source)
! The newvalue argument is of type bool *, int *, double *, or char **
! for bool, int/enum, float, or string variables respectively.  The check
! function should validate the 

Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-04 Thread Robert Haas
On Mon, Apr 4, 2011 at 4:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 OK.  Please comment the crap out of whatever you do, or maybe even add
 a README.  This stuff is just a bit arcane, and guideposts help a lot.

 We already have a README for that ;-).  PFA, a patch to
 src/backend/utils/misc/README describing the proposed revised API.
 If nobody has any objections, I'll get on with making this happen.

Looks reasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-03 Thread Tom Lane
I wrote:
 Robert Haas robertmh...@gmail.com writes:
 I had intended to commit Greg's patch with a show hook and an assign
 hook and the calculated value stored separately from the GUC variable,
 which I believe would avoid this is a problem, but Tom thought this
 way was better.  Unfortunately, my knowledge of the GUC machinery is
 insufficient to think of a way to avoid it, other than the one Tom
 already rejected.

 Mph ... I think this shows the error of my thinking :-(.  We at least
 need an assign hook here.  Will fix, since it was my oversight to begin
 with.

After thinking about this for awhile, I think the fundamental problem
is in the is_newvalue_equal() function, which as its comment states is
pretty cheesy:

/*
 * Attempt (badly) to detect if a proposed new GUC setting is the same
 * as the current value.
 *
 * XXX this does not really work because it doesn't account for the
 * effects of canonicalization of string values by assign_hooks.
 */

If you hold your head at a funny angle you can see replacement of -1
with a suitable default value as a form of canonicalization, so the
behavior Bernd complained of is exactly what the comment is talking
about.  We've not had complaints previously because is_newvalue_equal()
is only used for PGC_POSTMASTER variables, and few of those have assign
hooks that do any canonicalization.  But it is possible to trigger the
problem with unix_socket_directory, for example: try setting it to
something like '/tmp/bogus/..', and you'll see that pg_ctl reload
triggers a log message:

LOG:  parameter unix_socket_directory cannot be changed without restarting 
the server

Robert had suggested fixing this by kluging up wal_buffers' assign and
show hooks, but IIRC he never actually got that to work; I doubt it's
possible to make it work in the EXEC_BACKEND case without some
additional hacks to get the state to be properly replicated into child
processes.  Even if we could make it work, wal_buffers is not likely to
be the last variable that we'll want to allow auto-tuning of.  So
I think we ought to address the underlying problem instead of trying
to work around it in the variable-specific code for wal_buffers.

IMO the real problem is essentially that GUC assign hooks have two
functions, checking and canonicalization of the value-to-be-stored
versus executing secondary actions when an assignment is made; and
there's no way to get at just the first one.  So we cannot canonicalize
the value first and then see if it's equal to the current setting.
I think the only real fix is to change the API for assign hooks.  This
is a bit annoying but it's not like we haven't ever done that before.
I'm thinking about splitting assign hooks into two functions, along the
lines of

bool check_hook (datatype *new_value, GucSource source)
bool assign_hook (datatype new_value, GucSource source)

check_hook would validate the new value, and possibly change it (hence
the pass-by-reference parameter).  assign_hook would only be responsible
for executing secondary actions needed when an assignment is done.
The doit flag can go away since we'd not call the assign_hook at all
unless we actually want the assignment to occur.  I think most of the
existing uses might only need one or the other of these hooks, but
haven't gone through them yet.  It might be appropriate to change the
signatures some more while we're at it, in particular pass the desired
elog message level explicitly instead of making hooks infer it from the
source parameter.

It would probably take less than a day to flesh out this idea and make
it happen, but it does seem like a rather large change for late alpha.
If people don't want to do this now, I suggest that we just live with
the problem for 9.1.  It's purely cosmetic, and easy enough to work
around (just don't uncomment the default value).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-03 Thread Robert Haas
On Sun, Apr 3, 2011 at 1:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 IMO the real problem is essentially that GUC assign hooks have two
 functions, checking and canonicalization of the value-to-be-stored
 versus executing secondary actions when an assignment is made; and
 there's no way to get at just the first one.

Yes, I think that's right.  A related point is that the API for assign
hooks is not consistent across all data types: string assign hooks can
return a replacement value, whereas everyone else can only succeed or
fail.

 It would probably take less than a day to flesh out this idea and make
 it happen, but it does seem like a rather large change for late alpha.
 If people don't want to do this now, I suggest that we just live with
 the problem for 9.1.  It's purely cosmetic, and easy enough to work
 around (just don't uncomment the default value).

I think it's a pretty ugly wart, so I'm inclined to say go ahead and
fix it.  I'm not sure what alpha is for if it's not cleaning up the
detritus of all the stuff we've committed in the last 9 months.  AIUI,
what we're trying to avoid is committing new stuff that may require
additional cleanup, not cleaning up the stuff we already did commit.
Once we get to beta I'll be less enthusiastic about making changes
like this, but I think most of the testing that will get done is still
in front of us.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC assign hooks (was Re: [HACKERS] wal_buffers = -1 and SIGHUP)

2011-04-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Apr 3, 2011 at 1:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 IMO the real problem is essentially that GUC assign hooks have two
 functions, checking and canonicalization of the value-to-be-stored
 versus executing secondary actions when an assignment is made; and
 there's no way to get at just the first one.

 Yes, I think that's right.  A related point is that the API for assign
 hooks is not consistent across all data types: string assign hooks can
 return a replacement value, whereas everyone else can only succeed or
 fail.

Right.  In the original design we only foresaw the need to canonicalize
string values, so that's why it's like that.  This change will make it
more consistent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1 and SIGHUP

2011-04-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 31, 2011 at 8:38 AM, Bernd Helmle maili...@oopsware.de wrote:
 This might be nitpicking (or i'm currently missing something), but i
 recognized that setting wal_buffers = -1 always triggers the following on
 reload, even if nothing to wal_buffers had changed:
 
 $ pg_ctl reload
 LOG:  received SIGHUP, reloading configuration files
 LOG:  parameter wal_buffers cannot be changed without restarting the
 server
 
 This only happens when you have wal_buffers set to -1.

 This is a bug.  The root cause is that, on startup, we do this:

 if (xbuffers != XLOGbuffers)
 {
 snprintf(buf, sizeof(buf), %d, xbuffers);
 SetConfigOption(wal_buffers, buf, PGC_POSTMASTER,
 PGC_S_OVERRIDE);
 }

 I had intended to commit Greg's patch with a show hook and an assign
 hook and the calculated value stored separately from the GUC variable,
 which I believe would avoid this is a problem, but Tom thought this
 way was better.  Unfortunately, my knowledge of the GUC machinery is
 insufficient to think of a way to avoid it, other than the one Tom
 already rejected.

Mph ... I think this shows the error of my thinking :-(.  We at least
need an assign hook here.  Will fix, since it was my oversight to begin
with.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] wal_buffers = -1 and SIGHUP

2011-03-31 Thread Bernd Helmle
This might be nitpicking (or i'm currently missing something), but i recognized 
that setting wal_buffers = -1 always triggers the following on reload, even if 
nothing to wal_buffers had changed:


$ pg_ctl reload
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter wal_buffers cannot be changed without restarting the server

This only happens when you have wal_buffers set to -1.

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1 and SIGHUP

2011-03-31 Thread Robert Haas
On Thu, Mar 31, 2011 at 8:38 AM, Bernd Helmle maili...@oopsware.de wrote:
 This might be nitpicking (or i'm currently missing something), but i
 recognized that setting wal_buffers = -1 always triggers the following on
 reload, even if nothing to wal_buffers had changed:

 $ pg_ctl reload
 LOG:  received SIGHUP, reloading configuration files
 LOG:  parameter wal_buffers cannot be changed without restarting the
 server

 This only happens when you have wal_buffers set to -1.

This is a bug.  The root cause is that, on startup, we do this:

if (xbuffers != XLOGbuffers)
{
snprintf(buf, sizeof(buf), %d, xbuffers);
SetConfigOption(wal_buffers, buf, PGC_POSTMASTER,
PGC_S_OVERRIDE);
}

I had intended to commit Greg's patch with a show hook and an assign
hook and the calculated value stored separately from the GUC variable,
which I believe would avoid this is a problem, but Tom thought this
way was better.  Unfortunately, my knowledge of the GUC machinery is
insufficient to think of a way to avoid it, other than the one Tom
already rejected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers