Alvaro Herrera wrote:
ITAGAKI Takahiro wrote:
I have some comments about the double-buffering:
Since posting this patch I have realized that this implementation is
bogus. I'm now playing with WAL-logging hint bits though.
Yeah, the torn page + hint bit updates problem is the tough question.
- Is it ok to allocale dblbuf[BLCKSZ] as local variable?
It might be unaligned. AFAICS we avoid such usages in other places.
I thought about that too. I admit I am not sure if this really works
portably; however I don't want to add a palloc() to that routine.
It should work, AFAIK, but unaligned memcpy()s and write()s can be a
significantly slower. There can be only one write() happening at any
time, so you could just palloc() a single 8k buffer in TopMemoryContext
in backend startup, and always use that.
- Are there any other modules that can share in the benefits of
double-buffering? For example, we could avoid avoid waiting for
LockBufferForCleanup(). It is cool if the double-buffering can
be used for multiple purposes.
Not sure on this.
You'd need to keep both versions of the buffer simultaneously in the
buffer cache for that. When we talked about the various designs for HOT,
that was one of the ideas I had to enable more aggressive pruning: if
you can't immediately get a vacuum lock, allocate a new buffer in the
buffer cache for the same block, copy the page to the new buffer, and do
the pruning, including moving tuples around, there. Any new ReadBuffer
calls would return the new page version, but old readers would keep
referencing the old one. The intrusive part of that approach, in
addition to the obvious changes required in the buffer manager to keep
around multiple copies of the same block, is that all modifications must
be done on the new version, so anyone who needs to lock the page for
modification would need to switch to the new page version at the
LockBuffer call.
As discussed in the other thread with Simon, we also use vacuum locks in
b-tree for waiting out index scans, so avoiding the waiting there would
be just wrong.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers