Re: [PATCHES] Proof-of-concept ARC removal patches
> Tom Lane wrote > Attached are two different variants of a patch to remove the ARC > cache algorithm in favor of variants of the "2Q" algorithm. > > The first patch approximates the "full 2Q" algorithm described by > Johnson and Shasha, while the second approximates their > "simplified 2Q" > method. Full 2Q uses a list of pages that were recently in cache but > no longer are, while simplified 2Q does not; the first patch is > therefore a smaller change to the existing code. "Full 2Q" is the right one, I think. > The patches are not exactly the J&S algorithms, in part because I kept > the special cases for VACUUM, and in part because I really > don't believe > their idea about not promoting from T1 into T2 until the page > has fallen > into B1. This helped to minimize the change from the > existing ARC code. Agreed. > In some desultory testing with pgbench, there was no significant > performance difference among the three algorithms, which leads me to > think that simplified 2Q might be the best bet (it certainly has the > smallest memory footprint). But it would be a good idea to do some > more measurements before believing that. I hope that Mark > Wong can give > these a try on his setup soon. Will give these a whirl... > Comments? > The use of 25% T1 and 75% T2 is probably the only thing to discuss, for me. Johnson & Sasha's results show that having a larger T1 makes the system more responsive to changes, while a larger T2 gives a better longer term hit rate. J&S's results show that higher T1/T2 ratios are better with smaller caches. The CARS results show that keeping T1 above a certain minimum size is better also. >From that, I'd suggest we choose a T1/T2 ratio that changes according to how you start shared_buffers. If we had another GUC at all, it should be one that isn't listed in the postgresql.conf file at all...since it would be easy to misuse. A default setting of something like T1 as % of total (just roughly) = 50% when shared_buffers <= 1000 = 25% when shared_buffers = 5000 = 10% when shared_buffers = 2 with smoothing... Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [pgsql-patches] Proof-of-concept ARC removal patches
> Date: Thu, 03 Feb 2005 21:41:52 -0500 > From: Tom Lane <[EMAIL PROTECTED]> > Subject: Proof-of-concept ARC removal patches Re: subject -- shouldn't it be "replacement" and _not_ "removal" of ARC? > Attached are two different variants of a patch to remove the ARC > cache algorithm in favor of variants of the "2Q" algorithm. > ... > In some desultory testing with pgbench, there was no significant > performance difference among the three algorithms, which leads me to > think that simplified 2Q might be the best bet (it certainly has the > smallest memory footprint). But it would be a good idea to do some > more measurements before believing that. I hope that Mark Wong can give > these a try on his setup soon. > ... > > Comments? Wouldn't it been much more usable and fair to be able to "load" any of those modules as I am, the user see fit? So, I could swap the algos on the fly whichever better suits me? That will also lay down ground suitable for easier performance testing between the modules should people write their own, new ones. I'd keep the ARC around and not remove as a proof-of-concept module as well. Whatever the status of the IBM's ARC patent may become, the US customers will simply not use the module. Researchers, however, will always be able to tell that their modules are better or worse by comparing them to ARC (or 2Q or whatever). > regards, tom lane -- Serguei A. Mokhov| /~\The ASCII Computer Science Department | \ / Ribbon Campaign Concordia University | XAgainst HTML Montreal, Quebec, Canada | / \ Email! ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PATCHES] Proof-of-concept ARC removal patches
Attached are two different variants of a patch to remove the ARC cache algorithm in favor of variants of the "2Q" algorithm. The first patch approximates the "full 2Q" algorithm described by Johnson and Shasha, while the second approximates their "simplified 2Q" method. Full 2Q uses a list of pages that were recently in cache but no longer are, while simplified 2Q does not; the first patch is therefore a smaller change to the existing code. To show how closely the code is related, I kept the ARC names of T1, T2, and B1 for the corresponding 2Q lists (A1, Am, A1out respectively). We'd probably rename to use the 2Q names if we apply this, but the patches are smaller and easier to review as they are. The patches are not exactly the J&S algorithms, in part because I kept the special cases for VACUUM, and in part because I really don't believe their idea about not promoting from T1 into T2 until the page has fallen into B1. This helped to minimize the change from the existing ARC code. In some desultory testing with pgbench, there was no significant performance difference among the three algorithms, which leads me to think that simplified 2Q might be the best bet (it certainly has the smallest memory footprint). But it would be a good idea to do some more measurements before believing that. I hope that Mark Wong can give these a try on his setup soon. These should apply against CVS tip of either HEAD or REL8_0_STABLE branch; note that "CVS tip" includes the bufmgr refactoring patch I applied earlier today. Comments? regards, tom lane *** src/backend/storage/buffer/freelist.c.orig Thu Feb 3 18:29:11 2005 --- src/backend/storage/buffer/freelist.c Thu Feb 3 20:54:08 2005 *** *** 34,47 * Definitions for the buffer replacement strategy */ #define STRAT_LIST_UNUSED (-1) ! #define STRAT_LIST_B1 0 ! #define STRAT_LIST_T1 1 ! #define STRAT_LIST_T2 2 ! #define STRAT_LIST_B2 3 ! #define STRAT_NUM_LISTS 4 /* ! * The Cache Directory Block (CDB) of the Adaptive Replacement Cache (ARC) */ typedef struct { --- 34,49 * Definitions for the buffer replacement strategy */ #define STRAT_LIST_UNUSED (-1) ! #define STRAT_LIST_B1 0 /* A1out */ ! #define STRAT_LIST_T1 1 /* A1 */ ! #define STRAT_LIST_T2 2 /* Am */ ! #define STRAT_NUM_LISTS 3 /* ! * We have a cache directory block (CDB) for every file page currently being ! * tracked by the strategy module. There can be more CDBs than there are ! * actual shared buffers, allowing pages no longer in cache to still be ! * tracked. */ typedef struct { *** *** 55,68 } BufferStrategyCDB; /* ! * The shared ARC control information. */ typedef struct { int target_T1_size; /* What T1 size are we aiming for */ int listUnusedCDB; /* All unused StrategyCDB */ ! int listHead[STRAT_NUM_LISTS]; /* ARC lists B1, T1, T2 ! * and B2 */ int listTail[STRAT_NUM_LISTS]; int listSize[STRAT_NUM_LISTS]; Buffer listFreeBuffers;/* List of unused buffers */ --- 57,69 } BufferStrategyCDB; /* ! * The shared strategy control information. */ typedef struct { int target_T1_size; /* What T1 size are we aiming for */ int listUnusedCDB; /* All unused StrategyCDB */ ! int listHead[STRAT_NUM_LISTS]; /* CDB lists */ int listTail[STRAT_NUM_LISTS]; int listSize[STRAT_NUM_LISTS]; Buffer listFreeBuffers;/* List of unused buffers */ *** *** 91,97 #define B1_LENGTH (StrategyControl->listSize[STRAT_LIST_B1]) #define T1_LENGTH (StrategyControl->listSize[STRAT_LIST_T1]) #define T2_LENGTH (StrategyControl->listSize[STRAT_LIST_T2]) - #define B2_LENGTH (StrategyControl->listSize[STRAT_LIST_B2]) /* --- 92,97 *** *** 178,185 longall_hit, b1_hit, t1_hit, ! t2_hit, ! b2_hit; int id, t1_clean, t2_clean; --- 178,184 longall_hit, b1_hit, t1_hit, ! t2_hit; int id,
Re: [PATCHES] Refactoring lock.c
Heikki Linnakangas wrote: There's two almost identical pieces of code in LockRelease and LockReleaseAll that do the opposite of GrantLock. Here's a small patch that replaces those pieces with a static UnGrantLock function. Applied to HEAD with a few trivial editorial fixes. Thanks for the patch. -Neil ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] buffer manager refactoring to isolate ARC algorithm
I'm planning to apply the attached patch to hide the ARC-related data structures in freelist.c. There are no data structure or algorithm changes, just removal of ARC details from the shared header buf_internals.h, plus minor refactoring to hide the knowledge that ARC wants a twice-normal-size buffer lookup hashtable. This should allow us to make the ARC-or-not changes within freelist.c only. I did leave the "bufNext" fields of BufferDesc as-is. Strictly speaking this list is an internal data structure of freelist.c, but moving these fields out of BufferDesc would require a larger change than seems worthwhile, especially seeing that an LRU algorithm could make use of the same list. Any objections? regards, tom lane *** src/backend/storage/buffer/buf_init.c.orig Fri Dec 31 17:46:08 2004 --- src/backend/storage/buffer/buf_init.c Thu Feb 3 14:51:18 2005 *** *** 73,79 *aborts, it should only unpin the buffers exactly the number of times it *has pinned them, so that it will not blow away buffers of another *backend. - * */ --- 73,78 *** *** 120,133 block = BufferBlocks; /* !* link the buffers into a single linked list. This will become !* the LIFO list of unused buffers returned by !* StrategyGetBuffer(). */ for (i = 0; i < NBuffers; block += BLCKSZ, buf++, i++) { Assert(ShmemIsValid((unsigned long) block)); buf->bufNext = i + 1; CLEAR_BUFFERTAG(buf->tag); --- 119,135 block = BufferBlocks; /* !* Initialize all the buffer headers. */ for (i = 0; i < NBuffers; block += BLCKSZ, buf++, i++) { Assert(ShmemIsValid((unsigned long) block)); + /* +* The bufNext fields link together all totally-unused buffers. +* Subsequent management of this list is done by +* StrategyGetBuffer(). +*/ buf->bufNext = i + 1; CLEAR_BUFFERTAG(buf->tag); *** *** 142,148 buf->wait_backend_id = 0; } ! /* Correct last entry */ BufferDescriptors[NBuffers - 1].bufNext = -1; LWLockRelease(BufMgrLock); --- 144,150 buf->wait_backend_id = 0; } ! /* Correct last entry of linked list */ BufferDescriptors[NBuffers - 1].bufNext = -1; LWLockRelease(BufMgrLock); *** *** 178,184 /* * Convert shmem offsets into addresses as seen by this process. This !* is just to speed up the BufferGetBlock() macro. */ for (i = 0; i < NBuffers; i++) BufferBlockPointers[i] = (Block) MAKE_PTR(BufferDescriptors[i].data); --- 180,187 /* * Convert shmem offsets into addresses as seen by this process. This !* is just to speed up the BufferGetBlock() macro. It is OK to do this !* without any lock since the data pointers never change. */ for (i = 0; i < NBuffers; i++) BufferBlockPointers[i] = (Block) MAKE_PTR(BufferDescriptors[i].data); *** *** 201,214 /* size of data pages */ size += NBuffers * MAXALIGN(BLCKSZ); ! /* size of buffer hash table */ ! size += hash_estimate_size(NBuffers * 2, sizeof(BufferLookupEnt)); ! ! /* size of the shared replacement strategy control block */ ! size += MAXALIGN(sizeof(BufferStrategyControl)); ! ! /* size of the ARC directory blocks */ ! size += MAXALIGN(NBuffers * 2 * sizeof(BufferStrategyCDB)); return size; } --- 204,211 /* size of data pages */ size += NBuffers * MAXALIGN(BLCKSZ); ! /* size of stuff controlled by freelist.c */ ! size += StrategyShmemSize(); return size; } *** src/backend/storage/buffer/buf_table.c.orig Fri Dec 31 17:46:08 2004 --- src/backend/storage/buffer/buf_table.c Thu Feb 3 14:40:17 2005 *** *** 1,11 /*- * * buf_table.c ! * routines for finding buffers in the buffer pool. * ! * NOTE: these days, what this table actually provides is a mapping from ! * BufferTags to CDB indexes, not directly to buffers.The function names ! * are thus slight misnomers. * * Note: all routines in this file assume that the BufMgrLock is held * by the caller, so no synchronization is needed. ---
Re: [PATCHES] dbsize patch
Neil, do you have a verdict on this patch? On Friday January 28 2005 10:30, Ed L. wrote: > If the C code for the prior dbsize patch is not acceptable for > whatever reason, here's a SQL-based patch to replace it. It's > not a drop-in for 7.3/7.4 as I'd hoped, only an 8.1 patch. I > believe it is functionally equivalent to the C patch, but > simpler, shorter, and probably a tad slower. I also removed > the README section on how to aggregate since it was > incomplete/incorrect (it didn't count toasted indices) and > added a SQL function that itemizes the size for a relation's > table and index data (helpful to us in identifying bloat, > measuring performance, capacity estimation, etc). > > Ed ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Refactoring lock.c
On Thu, 3 Feb 2005, Neil Conway wrote: On Wed, 2005-02-02 at 21:41 +0200, Heikki Linnakangas wrote: There's two almost identical pieces of code in LockRelease and LockReleaseAll that do the opposite of GrantLock. Here's a small patch that replaces those pieces with a static UnGrantLock function. LockReleaseAll() did not update the holdMask bits for a released proclock, but it will do so now. That's okay because we're removing the proclock, right? Right. - Heikki ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+ memory leak)
Hello everyone. I fixed two bugs in the patch that I sent before. Check and test new one, please. 1. Fix update timing of Write->curridx. (pointed by Tom) Change to update it soon after write(). 2. Fix buffer alignment routine on 64bit cpu. (pointed by Mark) I checked it on Xeon EM64T and it worked properly, but I don't have IA64... BTW, I found memory leak in BootStrapXLOG(). The buffer allocated by malloc() is not free()ed. ISSUE_BOOTSTRAP_MEMORYLEAK in this patch points out it. (But this leak is not serious, because this function is called only once.) ITAGAKI Takahiro xlog.c.diff Description: Binary data ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly