Alvaro Herrera wrote:
ITAGAKI Takahiro wrote:

I have some comments about the double-buffering:

Since posting this patch I have realized that this implementation is
bogus. I'm now playing with WAL-logging hint bits though.

Yeah, the torn page + hint bit updates problem is the tough question.

- Is it ok to allocale dblbuf[BLCKSZ] as local variable?
  It might be unaligned. AFAICS we avoid such usages in other places.

I thought about that too.  I admit I am not sure if this really works
portably; however I don't want to add a palloc() to that routine.

It should work, AFAIK, but unaligned memcpy()s and write()s can be a significantly slower. There can be only one write() happening at any time, so you could just palloc() a single 8k buffer in TopMemoryContext in backend startup, and always use that.

- Are there any other modules that can share in the benefits of
  double-buffering? For example, we could avoid avoid waiting for
  LockBufferForCleanup(). It is cool if the double-buffering can
  be used for multiple purposes.

Not sure on this.

You'd need to keep both versions of the buffer simultaneously in the buffer cache for that. When we talked about the various designs for HOT, that was one of the ideas I had to enable more aggressive pruning: if you can't immediately get a vacuum lock, allocate a new buffer in the buffer cache for the same block, copy the page to the new buffer, and do the pruning, including moving tuples around, there. Any new ReadBuffer calls would return the new page version, but old readers would keep referencing the old one. The intrusive part of that approach, in addition to the obvious changes required in the buffer manager to keep around multiple copies of the same block, is that all modifications must be done on the new version, so anyone who needs to lock the page for modification would need to switch to the new page version at the LockBuffer call.

As discussed in the other thread with Simon, we also use vacuum locks in b-tree for waiting out index scans, so avoiding the waiting there would be just wrong.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to