On Fri, Sep 5, 2014 at 3:09 PM, Robert Haas wrote:
> On Fri, Sep 5, 2014 at 4:17 AM, didier wrote:
>>> It's not the size of the array that's the problem; it's the size of
>>> the detonation when the allocation fails.
>>>
>> You can use a file backed memory array
>> Or because it's only a hint and
On Fri, Sep 5, 2014 at 4:17 AM, didier wrote:
>> It's not the size of the array that's the problem; it's the size of
>> the detonation when the allocation fails.
>>
> You can use a file backed memory array
> Or because it's only a hint and
> - keys are in buffers (BufferTag), right?
> - transition
hi
On Thu, Sep 4, 2014 at 7:01 PM, Robert Haas wrote:
> On Thu, Sep 4, 2014 at 3:09 AM, Ants Aasma wrote:
>> On Thu, Sep 4, 2014 at 12:36 AM, Andres Freund
>> wrote:
>>> It's imo quite clearly better to keep it allocated. For one after
>>> postmaster started the checkpointer successfully you d
On Thu, Sep 4, 2014 at 3:09 AM, Ants Aasma wrote:
> On Thu, Sep 4, 2014 at 12:36 AM, Andres Freund wrote:
>> It's imo quite clearly better to keep it allocated. For one after
>> postmaster started the checkpointer successfully you don't need to be
>> worried about later failures to allocate memor
On Thu, Sep 4, 2014 at 12:36 AM, Andres Freund wrote:
> It's imo quite clearly better to keep it allocated. For one after
> postmaster started the checkpointer successfully you don't need to be
> worried about later failures to allocate memory if you allocate it once
> (unless the checkpointer FAT
On Sat, Aug 30, 2014 at 8:50 PM, Tom Lane wrote:
> Andres Freund writes:
>> On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote:
>>> A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write
>>> them out in order
>>> (http://www.postgresql.org/message-id/flat/20070614153758.6
On 2014-09-03 17:08:12 -0400, Robert Haas wrote:
> On Sat, Aug 30, 2014 at 2:04 PM, Andres Freund wrote:
> > If the sort buffer is allocated when the checkpointer is started, not
> > everytime we sort, as you've done in your version of the patch I think
> > that risk is pretty manageable. If we re
On Sat, Aug 30, 2014 at 2:04 PM, Andres Freund wrote:
> If the sort buffer is allocated when the checkpointer is started, not
> everytime we sort, as you've done in your version of the patch I think
> that risk is pretty manageable. If we really want to be sure nothing is
> happening at runtime, e
On Tue, Sep 2, 2014 at 8:14 AM, Fabien COELHO wrote:
>
> There is scan_whole_pool_milliseconds, which currently forces bgwriter to
circle the buffer pool at least once every 2 minutes. It is currently
fixed, but it should be trivial to turn it into an experimental guc that
you co
There is scan_whole_pool_milliseconds, which currently forces bgwriter to
circle the buffer pool at least once every 2 minutes. It is currently
fixed, but it should be trivial to turn it into an experimental guc that
you could use to test your hypothesis.
I recompiled with the variable coldly
On Tue, Aug 26, 2014 at 1:02 AM, Fabien COELHO wrote:
>
> Hello again,
>
>
> I have not found any mean to force bgwriter to send writes when it can.
>>> (Well, I have: create a process which sends "CHECKPOINT" every 0.2
>>> seconds... it works more or less, but this is not my point:-)
>>>
>>
>>
Hello Heikki,
For the kicks, I wrote a quick & dirty patch for interleaving the fsyncs, see
attached. It works by repeatedly scanning the buffer pool, writing buffers
belonging to a single relation segment at a time.
I tried this patch on the same host I used with the same "-R 25 -L 200 -T
Hi,
2014-08-31 8:10 GMT+09:00 Andres Freund :
> On 2014-08-31 01:50:48 +0300, Heikki Linnakangas wrote:
>
> If we're going to fsync between each file, there's no need to sort all the
> > buffers at once. It's enough to pick one file as the target - like in my
> > crude patch - and sort only the b
On 2014-08-31 01:50:48 +0300, Heikki Linnakangas wrote:
> On 08/30/2014 09:45 PM, Andres Freund wrote:
> >On 2014-08-30 14:16:10 -0400, Tom Lane wrote:
> >>Andres Freund writes:
> >>>On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
> A possible compromise is to sort a limited number of
> buff
On 08/30/2014 09:45 PM, Andres Freund wrote:
On 2014-08-30 14:16:10 -0400, Tom Lane wrote:
Andres Freund writes:
On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
A possible compromise is to sort a limited number of
buffers say, collect a few thousand dirty buffers then sort, dump and
fsync
On 2014-08-30 14:16:10 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
> >> A possible compromise is to sort a limited number of
> >> buffers say, collect a few thousand dirty buffers then sort, dump and
> >> fsync them, repeat as needed.
>
>
Andres Freund writes:
> On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
>> A possible compromise is to sort a limited number of
>> buffers say, collect a few thousand dirty buffers then sort, dump and
>> fsync them, repeat as needed.
> Yea, that's what I suggested nearby. But I don't really li
On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote:
> >> A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write
> >> them out in order
> >> (http://www.postgresql.org/message-id/flat/20070614153758.6
Andres Freund writes:
> On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote:
>> A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write
>> them out in order
>> (http://www.postgresql.org/message-id/flat/20070614153758.6a62.itagaki.takah...@oss.ntt.co.jp).
>> The performance
On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote:
> A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write
> them out in order
> (http://www.postgresql.org/message-id/flat/20070614153758.6a62.itagaki.takah...@oss.ntt.co.jp).
> The performance impact of that was inconclusi
I tried that by setting:
vm.dirty_expire_centisecs = 100
vm.dirty_writeback_centisecs = 100
So it should start writing returned buffers at most 2s after they are
returned, if I understood the doc correctly, instead of at most 35s.
The result is that with a 5000s 25tps pretty small load (th
On Thu, Aug 28, 2014 at 3:27 AM, Fabien COELHO wrote:
> Hello Aidan,
>
>
>> If all you want is to avoid the write storms when fsyncs start happening
>> on
>> slow storage, can you not just adjust the kernel vm.dirty* tunables to
>> start making the kernel write out dirty buffers much sooner instea
Hello Aidan,
If all you want is to avoid the write storms when fsyncs start happening on
slow storage, can you not just adjust the kernel vm.dirty* tunables to
start making the kernel write out dirty buffers much sooner instead of
letting them accumulate until fsyncs force them out all at once?
On 2014-08-27 19:00:12 +0200, Fabien COELHO wrote:
>
> >off:
> >
> >$ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L
> >200
> >number of skipped transactions: 1345 (6.246 %)
> >
> >on:
> >
> >$ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L
> >
off:
$ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200
number of skipped transactions: 1345 (6.246 %)
on:
$ pgbench -p 5440 -h /tmp postgres -M prepared -c 16 -j16 -T 120 -R 180 -L 200
number of skipped transactions: 1 (0.005 %)
That machine is far from idle ri
On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote:
> On 08/27/2014 04:20 PM, Andres Freund wrote:
> >On 2014-08-27 10:17:06 -0300, Claudio Freire wrote:
> >>>I think a somewhat smarter version of the explicit flushes in the
> >>>hack^Wpatch I posted nearby is going to more likely to be success
On 08/27/2014 04:20 PM, Andres Freund wrote:
On 2014-08-27 10:17:06 -0300, Claudio Freire wrote:
I think a somewhat smarter version of the explicit flushes in the
hack^Wpatch I posted nearby is going to more likely to be successful.
That path is "dangerous" (as in, may not work as intended) i
Hello,
If all you want is to avoid the write storms when fsyncs start happening on
slow storage, can you not just adjust the kernel vm.dirty* tunables to
start making the kernel write out dirty buffers much sooner instead of
letting them accumulate until fsyncs force them out all at once?
I c
On 2014-08-27 10:32:19 -0400, Aidan Van Dyk wrote:
> On Wed, Aug 27, 2014 at 3:32 AM, Fabien COELHO wrote:
>
> >
> > Hello Andres,
> >
> > [...]
> >>
> >> I think you're misunderstanding how spread checkpoints work.
> >>
> >
> > Yep, definitely:-) On the other hand I though I was seeking somethi
On Wed, Aug 27, 2014 at 3:32 AM, Fabien COELHO wrote:
>
> Hello Andres,
>
> [...]
>>
>> I think you're misunderstanding how spread checkpoints work.
>>
>
> Yep, definitely:-) On the other hand I though I was seeking something
> "simple", namely correct latency under small load, that I would expe
On Wed, Aug 27, 2014 at 10:20 AM, Andres Freund wrote:
> On 2014-08-27 10:17:06 -0300, Claudio Freire wrote:
>> > I think a somewhat smarter version of the explicit flushes in the
>> > hack^Wpatch I posted nearby is going to more likely to be successful.
>>
>>
>> That path is "dangerous" (as in, m
On 2014-08-27 10:17:06 -0300, Claudio Freire wrote:
> > I think a somewhat smarter version of the explicit flushes in the
> > hack^Wpatch I posted nearby is going to more likely to be successful.
>
>
> That path is "dangerous" (as in, may not work as intended) if the
> filesystem doesn't properly
On Wed, Aug 27, 2014 at 10:15 AM, Andres Freund wrote:
> On 2014-08-27 10:10:49 -0300, Claudio Freire wrote:
>> On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO wrote:
>> >> [...] What's your evidence the pacing doesn't work? Afaik it's the fsync
>> >> that causes the problem, not the the writes th
On 2014-08-27 10:10:49 -0300, Claudio Freire wrote:
> On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO wrote:
> >> [...] What's your evidence the pacing doesn't work? Afaik it's the fsync
> >> that causes the problem, not the the writes themselves.
> >
> >
> > Hmmm. My (poor) understanding is that f
On Wed, Aug 27, 2014 at 10:10 AM, Claudio Freire wrote:
> On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO wrote:
>>> [...] What's your evidence the pacing doesn't work? Afaik it's the fsync
>>> that causes the problem, not the the writes themselves.
>>
>>
>> Hmmm. My (poor) understanding is that f
On Wed, Aug 27, 2014 at 6:05 AM, Fabien COELHO wrote:
>> [...] What's your evidence the pacing doesn't work? Afaik it's the fsync
>> that causes the problem, not the the writes themselves.
>
>
> Hmmm. My (poor) understanding is that fsync would work fine if everything
> was already written beforeh
Hello Amit,
I see there is some merit in your point which is to make bgwriter more
useful than its current form. I could see 3 top level points to think
about whether improvement in any of those can improve the current
situation:
a. Scanning of buffer pool to find the dirty buffers that ca
On 2014-08-27 11:19:22 +0200, Andres Freund wrote:
> On 2014-08-27 11:14:46 +0200, Andres Freund wrote:
> > On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote:
> > > I can test a couple of patches. I already did one on someone advice (make
> > > bgwriter round all stuff in 1s instead of 120s, withou
On 2014-08-27 11:14:46 +0200, Andres Freund wrote:
> On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote:
> > I can test a couple of patches. I already did one on someone advice (make
> > bgwriter round all stuff in 1s instead of 120s, without positive effect.
>
> I've quickly cobbled together the a
On 2014-08-27 11:05:52 +0200, Fabien COELHO wrote:
> I can test a couple of patches. I already did one on someone advice (make
> bgwriter round all stuff in 1s instead of 120s, without positive effect.
I've quickly cobbled together the attached patch (which at least doesn't
seem to crash & burn).
[...] What's your evidence the pacing doesn't work? Afaik it's the fsync
that causes the problem, not the the writes themselves.
Hmmm. My (poor) understanding is that fsync would work fine if everything
was already written beforehand:-) that is it has nothing to do but assess
that all is alr
On 2014-08-27 09:32:16 +0200, Fabien COELHO wrote:
>
> Hello Andres,
>
> >[...]
> >I think you're misunderstanding how spread checkpoints work.
>
> Yep, definitely:-) On the other hand I though I was seeking something
> "simple", namely correct latency under small load, that I would expect out
>
Hello Andres,
[...]
I think you're misunderstanding how spread checkpoints work.
Yep, definitely:-) On the other hand I though I was seeking something
"simple", namely correct latency under small load, that I would expect out
of the box.
What you describe is reasonable, and is more or les
On Tue, Aug 26, 2014 at 12:53 PM, Fabien COELHO wrote:
>
> Given the small flow of updates, I do not think that there should be
reason to get that big a write contention between WAL & checkpoint.
>
> If tried with "full_page_write = off" for 500 seconds: same overall
behavior, 8.5% of transactions
Hello Jeff,
The culprit I found is "bgwriter", which is basically doing nothing to
prevent the coming checkpoint IO storm, even though there would be ample
time to write the accumulating dirty pages so that checkpoint would find a
clean field and pass in a blink. Indeed, at the end of the 500 s
On 2014-08-26 11:34:36 +0200, Fabien COELHO wrote:
>
> >Uh. I'm not surprised you're facing utterly horrible performance with
> >this. Did you try using a *large* checkpoints_segments setting? To
> >achieve high performance
>
> I do not seek "high performance" per se, I seek "lower maximum latenc
Uh. I'm not surprised you're facing utterly horrible performance with
this. Did you try using a *large* checkpoints_segments setting? To
achieve high performance
I do not seek "high performance" per se, I seek "lower maximum latency".
I think that the current settings and parameters are desig
On 2014-08-26 10:49:31 +0200, Fabien COELHO wrote:
>
> >What are the other settings here? checkpoint_segments,
> >checkpoint_timeout, wal_buffers?
>
> They simply are the defaults:
>
> checkpoint_segments = 3
> checkpoint_timeout = 5min
> wal_buffers = -1
>
> I did some test checkpoint_se
What are the other settings here? checkpoint_segments,
checkpoint_timeout, wal_buffers?
They simply are the defaults:
checkpoint_segments = 3
checkpoint_timeout = 5min
wal_buffers = -1
I did some test checkpoint_segments = 1, the problem is just more frequent
but shorter. I also reduc
On 2014-08-26 10:25:29 +0200, Fabien COELHO wrote:
> >Did you check whether xfs yields a, err, more predictable performance?
>
> No. I cannot test that easily without reinstalling the box. I did some quick
> tests with ZFS/FreeBSD which seemed to freeze the same, but not in the very
> same conditi
Hello Andres,
checkpoint when the segments are full... the server is unresponsive about
10% of the time (one in ten transaction is late by more than 200 ms).
That's ext4 I guess?
Yes!
Did you check whether xfs yields a, err, more predictable performance?
No. I cannot test that easily wi
On 2014-08-26 08:12:48 +0200, Fabien COELHO wrote:
> As for checkpoint spreading, raising checkpoint_completion_target to 0.9
> degrades the situation (20% of transactions are more than 200 ms late
> instead of 10%, bgwriter wrote less that 1 page per second, on on 500s run).
> So maybe there is a
Hello again,
I have not found any mean to force bgwriter to send writes when it can.
(Well, I have: create a process which sends "CHECKPOINT" every 0.2
seconds... it works more or less, but this is not my point:-)
There is scan_whole_pool_milliseconds, which currently forces bgwriter to
circl
Hello Amit,
I think another thing to know here is why exactly checkpoint
storm is causing tps to drop so steeply.
Yep. Actually it is not strictly 0, but a "few" tps that I rounded to 0.
progress: 63.0 s, 47.0 tps, lat 2.810 ms stddev 5.194, lag 0.354 ms
progress: 64.1 s, 11.9 tps, lat 8
[oops, wrong from, resent...]
Hello Jeff,
The culprit I found is "bgwriter", which is basically doing nothing to
prevent the coming checkpoint IO storm, even though there would be ample
time to write the accumulating dirty pages so that checkpoint would find a
clean field and pass in a blink.
On Monday, August 25, 2014, Fabien COELHO wrote:
>
> I have not found any mean to force bgwriter to send writes when it can.
> (Well, I have: create a process which sends "CHECKPOINT" every 0.2
> seconds... it works more or less, but this is not my point:-)
>
There is scan_whole_pool_millisecond
Hello Josh,
So I think that you're confusing the roles of bgwriter vs. spread
checkpoint. What you're experiencing above is pretty common for
nonspread checkpoints on slow storage (and RAID5 is slow for DB updates,
no matter how fast the disks are), or for attempts to do spread
checkpoint on f
On Tue, Aug 26, 2014 at 1:53 AM, Fabien COELHO wrote:
>
>
> Hello pgdevs,
>
> I've been playing with pg for some time now to try to reduce the maximum
latency of simple requests, to have a responsive server under small to
medium load.
>
> On an old computer with a software RAID5 HDD attached, pgbe
On Monday, August 25, 2014, Fabien COELHO wrote:
>
>
> The culprit I found is "bgwriter", which is basically doing nothing to
> prevent the coming checkpoint IO storm, even though there would be ample
> time to write the accumulating dirty pages so that checkpoint would find a
> clean field and p
Hi,
On 2014-08-25 22:23:40 +0200, Fabien COELHO wrote:
> seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO
> storm. The server is totally unresponsive 75% of the time. That's bandwidth
> optimization for you. Hmmm... why not.
>
> Now, given this setup, if pgbench is thro
On 08/25/2014 01:23 PM, Fabien COELHO wrote:
>
> Hello pgdevs,
>
> I've been playing with pg for some time now to try to reduce the maximum
> latency of simple requests, to have a responsive server under small to
> medium load.
>
> On an old computer with a software RAID5 HDD attached, pgbench s
Hello pgdevs,
I've been playing with pg for some time now to try to reduce the maximum
latency of simple requests, to have a responsive server under small to
medium load.
On an old computer with a software RAID5 HDD attached, pgbench
simple update script run for a some time (scale 100, fill
62 matches
Mail list logo