Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2007-02-05 Thread Jim Nasby
I think there's improvement to be made in how we track buffer usage  
in general. Seqscans still hold the same weight as any other  
operation, the freelist is of questionable value, and there's a lot  
of work done to find a free buffer out of the pool, for example.


On Feb 2, 2007, at 8:08 PM, Bruce Momjian wrote:



Is this a TODO item?

-- 
-


ITAGAKI Takahiro wrote:

Simon Riggs [EMAIL PROTECTED] wrote:

I think what you are saying is: VACUUM places blocks so that they  
are

immediately reused. This stops shared_buffers from being polluted by
vacuumed-blocks, but it also means that almost every write becomes a
backend dirty write when VACUUM is working, bgwriter or not. That  
also

means that we flush WAL more often than we otherwise would.


That's right. I think it's acceptable that vacuuming process  
writes dirty
buffers made by itself, because only the process slows down; other  
backends
can run undisturbedly. However, frequent WAL flushing should be  
avoided.


I found the problem when I ran VACUUM FREEZE separately. But if  
there were
some backends, dirty buffers made by VACUUM would be reused by  
those backends,

not by the vacuuming process.

From above my thinking would be to have a more general  
implementation:
Each backend keeps a list of cache buffers to reuse in its local  
loop,
rather than using the freelist as a global list. That way the  
technique
would work even when we have multiple Vacuums working  
concurrently. It
would also then be possible to use this for the SeqScan case as  
well.


Great idea! The troubles are in the usage of buffers by SeqScan  
and VACUUM.

The former uses too many buffers and the latter uses too few buffers.
Your cache-looping will work around both cases.

Another connected thought is the idea of a having a  
FullBufferList - the
opposite of a free buffer list. When VACUUM/INSERT/COPY fills a  
block we
notify the buffer manager that this block needs writing ahead of  
other
buffers, so that the bgwriter can work more effectively. That  
seems like
it would help with both this current patch and the additional  
thoughts

above.


Do you mean that bgwriter should take care of buffers in freelist,  
not only
ones in the tail of LRU? We might need activity control of  
bgwriter. Buffers
are reused rapidly in VACUUM or bulk insert, so bgwriter is not  
sufficient

if its settings are same as usual.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


--
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster



--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2007-02-02 Thread Bruce Momjian

Is this a TODO item?

---

ITAGAKI Takahiro wrote:
 Simon Riggs [EMAIL PROTECTED] wrote:
 
  I think what you are saying is: VACUUM places blocks so that they are
  immediately reused. This stops shared_buffers from being polluted by
  vacuumed-blocks, but it also means that almost every write becomes a
  backend dirty write when VACUUM is working, bgwriter or not. That also
  means that we flush WAL more often than we otherwise would.
 
 That's right. I think it's acceptable that vacuuming process writes dirty
 buffers made by itself, because only the process slows down; other backends
 can run undisturbedly. However, frequent WAL flushing should be avoided.
 
 I found the problem when I ran VACUUM FREEZE separately. But if there were
 some backends, dirty buffers made by VACUUM would be reused by those backends,
 not by the vacuuming process.
 
  From above my thinking would be to have a more general implementation:
  Each backend keeps a list of cache buffers to reuse in its local loop,
  rather than using the freelist as a global list. That way the technique
  would work even when we have multiple Vacuums working concurrently. It
  would also then be possible to use this for the SeqScan case as well.
 
 Great idea! The troubles are in the usage of buffers by SeqScan and VACUUM.
 The former uses too many buffers and the latter uses too few buffers.
 Your cache-looping will work around both cases.
 
  Another connected thought is the idea of a having a FullBufferList - the
  opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
  notify the buffer manager that this block needs writing ahead of other
  buffers, so that the bgwriter can work more effectively. That seems like
  it would help with both this current patch and the additional thoughts
  above.
 
 Do you mean that bgwriter should take care of buffers in freelist, not only
 ones in the tail of LRU? We might need activity control of bgwriter. Buffers
 are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
 if its settings are same as usual.
 
 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center
 
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2006-12-28 Thread Jim C. Nasby
On Tue, Dec 19, 2006 at 05:53:06PM +0900, ITAGAKI Takahiro wrote:
 Simon Riggs [EMAIL PROTECTED] wrote:
  Another connected thought is the idea of a having a FullBufferList - the
  opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
  notify the buffer manager that this block needs writing ahead of other
  buffers, so that the bgwriter can work more effectively. That seems like
  it would help with both this current patch and the additional thoughts
  above.
 
 Do you mean that bgwriter should take care of buffers in freelist, not only
 ones in the tail of LRU? We might need activity control of bgwriter. Buffers
 are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
 if its settings are same as usual.

Actually, if I understand the code, the LRU stuff actually only hits
the free list. Also, the only thing that runs the clock sweep (which is
what maintains the LRU-type info) is a backend requesting a page and not
finding one on the free list.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2006-12-19 Thread ITAGAKI Takahiro
Simon Riggs [EMAIL PROTECTED] wrote:

 I think what you are saying is: VACUUM places blocks so that they are
 immediately reused. This stops shared_buffers from being polluted by
 vacuumed-blocks, but it also means that almost every write becomes a
 backend dirty write when VACUUM is working, bgwriter or not. That also
 means that we flush WAL more often than we otherwise would.

That's right. I think it's acceptable that vacuuming process writes dirty
buffers made by itself, because only the process slows down; other backends
can run undisturbedly. However, frequent WAL flushing should be avoided.

I found the problem when I ran VACUUM FREEZE separately. But if there were
some backends, dirty buffers made by VACUUM would be reused by those backends,
not by the vacuuming process.

 From above my thinking would be to have a more general implementation:
 Each backend keeps a list of cache buffers to reuse in its local loop,
 rather than using the freelist as a global list. That way the technique
 would work even when we have multiple Vacuums working concurrently. It
 would also then be possible to use this for the SeqScan case as well.

Great idea! The troubles are in the usage of buffers by SeqScan and VACUUM.
The former uses too many buffers and the latter uses too few buffers.
Your cache-looping will work around both cases.

 Another connected thought is the idea of a having a FullBufferList - the
 opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
 notify the buffer manager that this block needs writing ahead of other
 buffers, so that the bgwriter can work more effectively. That seems like
 it would help with both this current patch and the additional thoughts
 above.

Do you mean that bgwriter should take care of buffers in freelist, not only
ones in the tail of LRU? We might need activity control of bgwriter. Buffers
are reused rapidly in VACUUM or bulk insert, so bgwriter is not sufficient
if its settings are same as usual.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2006-12-18 Thread Simon Riggs
On Mon, 2006-12-18 at 11:55 +0900, ITAGAKI Takahiro wrote:

 I'm testing the recently changes of WAL entries for freezing-tuples.
 VACUUM FREEZE took more time. The cause seems to be flushing WAL buffers.

Great thinking.

 Vacuuming processes free buffers into freelist. The buffers in freelist is
 preferentially used on next allocation of buffers. Then, if the buffer is
 dirty, the allocator must write it before reuse. However, there are few 
 buffers
 in freelist typically, buffers made dirty recently are reused too soon
 -- The WAL entries for the dirty buffer has not been flushed yet, so the
 allocator flushes WAL, writes the buffer, and finally reuses it.

I think what you are saying is: VACUUM places blocks so that they are
immediately reused. This stops shared_buffers from being polluted by
vacuumed-blocks, but it also means that almost every write becomes a
backend dirty write when VACUUM is working, bgwriter or not. That also
means that we flush WAL more often than we otherwise would.

 One solution is always keeping some buffers in freelist. If there were
 N buffers in freelist, the necessity of WAL-flusing was reduced to 1/N,
 because all WAL entries are flushed when we do one of them.

That sounds very similar to an idea I'd been working on which I'd called
cache looping. There is a related (but opposite) problem with sequential
scans - they don't move through the cache fast enough. A solution to
both issues is to have the Vacuum/SeqScans continually reuse a small
pool of buffers, rather than request the next one from the buffer
manager in the normal way.

 The attached patch is an experimental implementation of the above. Keeping
 32 buffers seems to be enough when executed separately. With some background
 jobs, other numbers may be better.
 
  N | time  | XLogWrite/XLogFlush
 ---+---+-
  1 | 68.2s | 25.6%
  8 | 57.4s | 10.8%
 32 | 54.0s |  3.4%

I think this is good proof; well done.

From above my thinking would be to have a more general implementation:
Each backend keeps a list of cache buffers to reuse in its local loop,
rather than using the freelist as a global list. That way the technique
would work even when we have multiple Vacuums working concurrently. It
would also then be possible to use this for the SeqScan case as well.

Cache looping would be implemented by a modified BufferAlloc routine,
say BufferScanAlloc() that is called only when a StrategyUseCacheLoop()
has been called during SeqScan or VacuumScan. strategy_cache_loop would
replace strategy_hint_vacuum.

Each backend would have a list of previous N buffers touched. When N
=Nmax, we would link to oldest buffer to form a linked ring. Each time
we need next buffer we read from the ring rather than from the main
clock sweep. If the buffer identified is pinned, then we drop that from
the ring and apply normally for a new buffer and keep that instead. At
the end of the scan, we simply forget the buffer ring.

Another connected thought is the idea of a having a FullBufferList - the
opposite of a free buffer list. When VACUUM/INSERT/COPY fills a block we
notify the buffer manager that this block needs writing ahead of other
buffers, so that the bgwriter can work more effectively. That seems like
it would help with both this current patch and the additional thoughts
above.

 [initial data]
 $ pgbench -s 40 -i;
 # VACUUM FREEZE
 [test]
 # UPDATE accounts SET aid=aid WHERE random()  0.005;
 # checkpoint;
 # VACUUM FREEZE accounts;
 
 
 I cannot see the above problem in non-freeze vacuum. The number buffers
 in freelist increases on index-vacuuming phase. When the vacuum found
 seldom used buffers (refcount==0 and usage_count==0), they are added into
 freelist. So the WAL entries generated in index-vacuuming or heap-vacuuming
 phase are not so serious. However, entries for FREEZE are generated in
 heap-scanning phase, it is before index-vacuuming.

This happens for setting hint-bits also in normal operation, which might
only occur once in most test situations. In practice, this can occur
each time we touch a row and then VACUUM, so we end up re-writing the
block many times in the way you describe.

IIRC Heikki was thinking of altering the way VACUUM works to avoid it
writing out blocks that it was going to come back to in the second phase
anyway. That would go some way to alleviating the problem you describe,
but wouldn't go as far as the technique you suggest.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2006-12-18 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I think what you are saying is: VACUUM places blocks so that they are
 immediately reused. This stops shared_buffers from being polluted by
 vacuumed-blocks, but it also means that almost every write becomes a
 backend dirty write when VACUUM is working, bgwriter or not. That also
 means that we flush WAL more often than we otherwise would.

Do we care?  As long as the writes are done by the vacuum process, ISTM
this is taking load off the foreground query processes, by saving them
from having to do writes.

In any case, I'm unclear on why we should add a boatload of complexity
to improve performance of something that's done as rarely as VACUUM
FREEZE is.  Quite aside from maintainability concerns, even a few extra
cycles added to the more common code paths would make it a net
performance loss overall.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Dirty pages in freelist cause WAL stuck

2006-12-18 Thread Simon Riggs
On Mon, 2006-12-18 at 11:13 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I think what you are saying is: VACUUM places blocks so that they are
  immediately reused. This stops shared_buffers from being polluted by
  vacuumed-blocks, but it also means that almost every write becomes a
  backend dirty write when VACUUM is working, bgwriter or not. That also
  means that we flush WAL more often than we otherwise would.
 
 Do we care?  As long as the writes are done by the vacuum process, ISTM
 this is taking load off the foreground query processes, by saving them
 from having to do writes.

I'm not bothered about speeding up VACUUM FREEZE at all, but the effect
noted by Itagaki-san is clearly real and so can easily effect other
processes. I believe it does effect other backends and had already noted
what I thought was that effect myself. If we had better server
instrumentation it would be easy to demonstrate either way.

 In any case, I'm unclear on why we should add a boatload of complexity
 to improve performance of something that's done as rarely as VACUUM
 FREEZE is.  Quite aside from maintainability concerns, even a few extra
 cycles added to the more common code paths would make it a net
 performance loss overall.

As I noted, this isn't just VACUUM FREEZE (why would it be?), but all
VACUUMs - that *is* a common code path on a busy system. VACUUM FREEZE
simply dirties more blocks and has a more clearly noticeable effect.

From your comments we clearly need more testing to demonstrate the
effect on normal backends before we move to a solution.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Dirty pages in freelist cause WAL stuck

2006-12-17 Thread ITAGAKI Takahiro
Hello,

I'm testing the recently changes of WAL entries for freezing-tuples.
VACUUM FREEZE took more time. The cause seems to be flushing WAL buffers.

Vacuuming processes free buffers into freelist. The buffers in freelist is
preferentially used on next allocation of buffers. Then, if the buffer is
dirty, the allocator must write it before reuse. However, there are few buffers
in freelist typically, buffers made dirty recently are reused too soon
-- The WAL entries for the dirty buffer has not been flushed yet, so the
allocator flushes WAL, writes the buffer, and finally reuses it.


One solution is always keeping some buffers in freelist. If there were
N buffers in freelist, the necessity of WAL-flusing was reduced to 1/N,
because all WAL entries are flushed when we do one of them.

The attached patch is an experimental implementation of the above. Keeping
32 buffers seems to be enough when executed separately. With some background
jobs, other numbers may be better.

 N | time  | XLogWrite/XLogFlush
---+---+-
 1 | 68.2s | 25.6%
 8 | 57.4s | 10.8%
32 | 54.0s |  3.4%

[initial data]
$ pgbench -s 40 -i;
# VACUUM FREEZE
[test]
# UPDATE accounts SET aid=aid WHERE random()  0.005;
# checkpoint;
# VACUUM FREEZE accounts;


I cannot see the above problem in non-freeze vacuum. The number buffers
in freelist increases on index-vacuuming phase. When the vacuum found
seldom used buffers (refcount==0 and usage_count==0), they are added into
freelist. So the WAL entries generated in index-vacuuming or heap-vacuuming
phase are not so serious. However, entries for FREEZE are generated in
heap-scanning phase, it is before index-vacuuming.

Are there any better fixes? Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


freelist_buffers.patch
Description: Binary data

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings