Jeff Davis wrote:
On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
Another approach I proposed back in December is to not have a variable like that at all, but scan the buffer cache for pages belonging to the table you're scanning to initialize the scan. Scanning all the BufferDescs is a fairly CPU and lock heavy operation, but it might be ok given that we're talking about large I/O bound sequential scans. It would require no DBA tuning and would work more robustly in varying conditions. I'm not sure where you would continue after scanning the in-cache pages. At the highest in-cache block number, perhaps.
If there was some way to do that, it'd be what I'd vote for.


I still don't know how to make this take advantage of the OS buffer
cache.

Yep, I don't see any way to do that. I think we could live with that, though. If we went with the sync_scan_offset approach, you'd have to leave a lot of safety margin in that as well.

However, no DBA tuning is a huge advantage, I agree with that.

If I were to implement this idea, I think Heikki's bitmap of pages
already read is the way to go. Can you guys give me some pointers about
how to walk through the shared buffers, reading the pages that I need,
while being sure not to read a page that's been evicted, and also not
potentially causing a performance regression somewhere else?

You could take a look at BufferSync, for example. It walks through the buffer cache, syncing all dirty buffers.

FWIW, I've attached a function I wrote some time ago when I was playing with the same idea for vacuums. A call to the new function loops through the buffer cache and returns the next buffer that belong to a certain relation. I'm not sure that it's correct and safe, and there's not much comments, but should work if you want to play with it...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.214
diff -c -r1.214 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	5 Jan 2007 22:19:37 -0000	1.214
--- src/backend/storage/buffer/bufmgr.c	22 Jan 2007 16:38:37 -0000
***************
*** 97,102 ****
--- 97,134 ----
  static void AtProcExit_Buffers(int code, Datum arg);
  
  
+ Buffer
+ ReadAnyBufferForRelation(Relation reln)
+ {
+ 	static int last_buf_id = 0;
+ 	int new_buf_id;
+ 	volatile BufferDesc *bufHdr;
+ 
+ 	/* Make sure we will have room to remember the buffer pin */
+ 	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
+ 
+ 	new_buf_id = last_buf_id;
+ 	do
+ 	{
+ 		if (++new_buf_id >= NBuffers)
+ 			new_buf_id = 0;
+ 
+ 		bufHdr = &BufferDescriptors[new_buf_id];
+ 		LockBufHdr(bufHdr);
+ 
+ 		if ((bufHdr->flags & BM_VALID) && RelFileNodeEquals(bufHdr->tag.rnode, reln->rd_node))
+ 		{
+ 			PinBuffer_Locked(bufHdr);
+ 			last_buf_id = new_buf_id;
+ 			return BufferDescriptorGetBuffer(bufHdr);
+ 		}
+ 		UnlockBufHdr(bufHdr);
+ 	} while(new_buf_id != last_buf_id);
+ 	last_buf_id = new_buf_id;
+ 	return InvalidBuffer;
+ }
+ 
+ 
  /*
   * ReadBuffer -- returns a buffer containing the requested
   *		block of the requested relation.  If the blknum
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to