Re: [PATCHES] Proof-of-concept ARC removal patches

2005-02-03 Thread Simon Riggs
> Tom Lane wrote
> Attached are two different variants of a patch to remove the ARC
> cache algorithm in favor of variants of the "2Q" algorithm.
>
> The first patch approximates the "full 2Q" algorithm described by
> Johnson and Shasha, while the second approximates their
> "simplified 2Q"
> method.  Full 2Q uses a list of pages that were recently in cache but
> no longer are, while simplified 2Q does not; the first patch is
> therefore a smaller change to the existing code.

"Full 2Q" is the right one, I think.

> The patches are not exactly the J&S algorithms, in part because I kept
> the special cases for VACUUM, and in part because I really
> don't believe
> their idea about not promoting from T1 into T2 until the page
> has fallen
> into B1.  This helped to minimize the change from the
> existing ARC code.

Agreed.

> In some desultory testing with pgbench, there was no significant
> performance difference among the three algorithms, which leads me to
> think that simplified 2Q might be the best bet (it certainly has the
> smallest memory footprint).  But it would be a good idea to do some
> more measurements before believing that.  I hope that Mark
> Wong can give
> these a try on his setup soon.

Will give these a whirl...

> Comments?
>

The use of 25% T1 and 75% T2 is probably the only thing to discuss, for
me.

Johnson & Sasha's results show that having a larger T1 makes the system
more responsive to changes, while a larger T2 gives a better longer term
hit rate. J&S's results show that higher T1/T2 ratios are better with
smaller caches. The CARS results show that keeping T1 above a certain
minimum size is better also.

>From that, I'd suggest we choose a T1/T2 ratio that changes according to
how you start shared_buffers. If we had another GUC at all, it should be
one that isn't listed in the postgresql.conf file at all...since it
would be easy to misuse.

A default setting of something like T1 as % of total (just roughly)
= 50% when shared_buffers <= 1000
= 25% when shared_buffers = 5000
= 10% when shared_buffers = 2
with smoothing...

Best Regards, Simon Riggs






---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] [pgsql-patches] Proof-of-concept ARC removal patches

2005-02-03 Thread Serguei Mokhov
> Date: Thu, 03 Feb 2005 21:41:52 -0500
> From: Tom Lane <[EMAIL PROTECTED]>
> Subject: Proof-of-concept ARC removal patches

Re: subject -- shouldn't it be "replacement" and _not_ "removal" of ARC?

> Attached are two different variants of a patch to remove the ARC
> cache algorithm in favor of variants of the "2Q" algorithm.
>
...
> In some desultory testing with pgbench, there was no significant
> performance difference among the three algorithms, which leads me to
> think that simplified 2Q might be the best bet (it certainly has the
> smallest memory footprint).  But it would be a good idea to do some
> more measurements before believing that.  I hope that Mark Wong can give
> these a try on his setup soon.
>
...
>
> Comments?

Wouldn't it been much more usable and fair to be able to "load" any of those
modules as I am, the user see fit? So, I could swap the algos on the fly
whichever better suits me? That will also lay down ground suitable for easier
performance testing between the modules should people write their own, new
ones.

I'd keep the ARC around and not remove as a proof-of-concept module as well.
Whatever the status of the IBM's ARC patent may become, the US customers will
simply not use the module. Researchers, however, will always be able to tell
that their modules are better or worse by comparing them to ARC (or 2Q or
whatever).

>   regards, tom lane


--
Serguei A. Mokhov|  /~\The ASCII
Computer Science Department  |  \ / Ribbon Campaign
Concordia University |   XAgainst HTML
Montreal, Quebec, Canada |  / \  Email!




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PATCHES] Proof-of-concept ARC removal patches

2005-02-03 Thread Tom Lane
Attached are two different variants of a patch to remove the ARC
cache algorithm in favor of variants of the "2Q" algorithm.

The first patch approximates the "full 2Q" algorithm described by
Johnson and Shasha, while the second approximates their "simplified 2Q"
method.  Full 2Q uses a list of pages that were recently in cache but
no longer are, while simplified 2Q does not; the first patch is
therefore a smaller change to the existing code.

To show how closely the code is related, I kept the ARC names of T1, T2,
and B1 for the corresponding 2Q lists (A1, Am, A1out respectively).
We'd probably rename to use the 2Q names if we apply this, but the
patches are smaller and easier to review as they are.

The patches are not exactly the J&S algorithms, in part because I kept
the special cases for VACUUM, and in part because I really don't believe
their idea about not promoting from T1 into T2 until the page has fallen
into B1.  This helped to minimize the change from the existing ARC code.

In some desultory testing with pgbench, there was no significant
performance difference among the three algorithms, which leads me to
think that simplified 2Q might be the best bet (it certainly has the
smallest memory footprint).  But it would be a good idea to do some
more measurements before believing that.  I hope that Mark Wong can give
these a try on his setup soon.

These should apply against CVS tip of either HEAD or REL8_0_STABLE
branch; note that "CVS tip" includes the bufmgr refactoring patch
I applied earlier today.

Comments?

regards, tom lane

*** src/backend/storage/buffer/freelist.c.orig  Thu Feb  3 18:29:11 2005
--- src/backend/storage/buffer/freelist.c   Thu Feb  3 20:54:08 2005
***
*** 34,47 
   * Definitions for the buffer replacement strategy
   */
  #define STRAT_LIST_UNUSED (-1)
! #define STRAT_LIST_B1 0
! #define STRAT_LIST_T1 1
! #define STRAT_LIST_T2 2
! #define STRAT_LIST_B2 3
! #define STRAT_NUM_LISTS   4
  
  /*
!  * The Cache Directory Block (CDB) of the Adaptive Replacement Cache (ARC)
   */
  typedef struct
  {
--- 34,49 
   * Definitions for the buffer replacement strategy
   */
  #define STRAT_LIST_UNUSED (-1)
! #define STRAT_LIST_B1 0   /* A1out */
! #define STRAT_LIST_T1 1   /* A1 */
! #define STRAT_LIST_T2 2   /* Am */
! #define STRAT_NUM_LISTS   3
  
  /*
!  * We have a cache directory block (CDB) for every file page currently being
!  * tracked by the strategy module.  There can be more CDBs than there are
!  * actual shared buffers, allowing pages no longer in cache to still be
!  * tracked.
   */
  typedef struct
  {
***
*** 55,68 
  } BufferStrategyCDB;
  
  /*
!  * The shared ARC control information.
   */
  typedef struct
  {
int target_T1_size; /* What T1 size are we aiming 
for */
int listUnusedCDB;  /* All unused StrategyCDB */
!   int listHead[STRAT_NUM_LISTS];  /* ARC 
lists B1, T1, T2
!   
 * and B2 */
int listTail[STRAT_NUM_LISTS];
int listSize[STRAT_NUM_LISTS];
Buffer  listFreeBuffers;/* List of unused buffers */
--- 57,69 
  } BufferStrategyCDB;
  
  /*
!  * The shared strategy control information.
   */
  typedef struct
  {
int target_T1_size; /* What T1 size are we aiming 
for */
int listUnusedCDB;  /* All unused StrategyCDB */
!   int listHead[STRAT_NUM_LISTS];  /* CDB 
lists */
int listTail[STRAT_NUM_LISTS];
int listSize[STRAT_NUM_LISTS];
Buffer  listFreeBuffers;/* List of unused buffers */
***
*** 91,97 
  #define B1_LENGTH (StrategyControl->listSize[STRAT_LIST_B1])
  #define T1_LENGTH (StrategyControl->listSize[STRAT_LIST_T1])
  #define T2_LENGTH (StrategyControl->listSize[STRAT_LIST_T2])
- #define B2_LENGTH (StrategyControl->listSize[STRAT_LIST_B2])
  
  
  /*
--- 92,97 
***
*** 178,185 
longall_hit,
b1_hit,
t1_hit,
!   t2_hit,
!   b2_hit;
int id,
t1_clean,
t2_clean;
--- 178,184 
longall_hit,
b1_hit,
t1_hit,
!   t2_hit;
int id,
  

Re: [PATCHES] Refactoring lock.c

2005-02-03 Thread Neil Conway
Heikki Linnakangas wrote:
There's two almost identical pieces of code in LockRelease and 
LockReleaseAll that do the opposite of GrantLock.

Here's a small patch that replaces those pieces with a static 
UnGrantLock function.
Applied to HEAD with a few trivial editorial fixes.
Thanks for the patch.
-Neil
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PATCHES] buffer manager refactoring to isolate ARC algorithm

2005-02-03 Thread Tom Lane
I'm planning to apply the attached patch to hide the ARC-related data
structures in freelist.c.  There are no data structure or algorithm
changes, just removal of ARC details from the shared header
buf_internals.h, plus minor refactoring to hide the knowledge that ARC
wants a twice-normal-size buffer lookup hashtable.  This should allow us
to make the ARC-or-not changes within freelist.c only.

I did leave the "bufNext" fields of BufferDesc as-is.  Strictly speaking
this list is an internal data structure of freelist.c, but moving these
fields out of BufferDesc would require a larger change than seems
worthwhile, especially seeing that an LRU algorithm could make use of
the same list.

Any objections?

regards, tom lane

*** src/backend/storage/buffer/buf_init.c.orig  Fri Dec 31 17:46:08 2004
--- src/backend/storage/buffer/buf_init.c   Thu Feb  3 14:51:18 2005
***
*** 73,79 
   *aborts, it should only unpin the buffers exactly the number of 
times it
   *has pinned them, so that it will not blow away buffers of 
another
   *backend.
-  *
   */
  
  
--- 73,78 
***
*** 120,133 
block = BufferBlocks;
  
/*
!* link the buffers into a single linked list. This will become
!* the LIFO list of unused buffers returned by
!* StrategyGetBuffer().
 */
for (i = 0; i < NBuffers; block += BLCKSZ, buf++, i++)
{
Assert(ShmemIsValid((unsigned long) block));
  
buf->bufNext = i + 1;
  
CLEAR_BUFFERTAG(buf->tag);
--- 119,135 
block = BufferBlocks;
  
/*
!* Initialize all the buffer headers.
 */
for (i = 0; i < NBuffers; block += BLCKSZ, buf++, i++)
{
Assert(ShmemIsValid((unsigned long) block));
  
+   /*
+* The bufNext fields link together all totally-unused 
buffers.
+* Subsequent management of this list is done by
+* StrategyGetBuffer().
+*/
buf->bufNext = i + 1;
  
CLEAR_BUFFERTAG(buf->tag);
***
*** 142,148 
buf->wait_backend_id = 0;
}
  
!   /* Correct last entry */
BufferDescriptors[NBuffers - 1].bufNext = -1;
  
LWLockRelease(BufMgrLock);
--- 144,150 
buf->wait_backend_id = 0;
}
  
!   /* Correct last entry of linked list */
BufferDescriptors[NBuffers - 1].bufNext = -1;
  
LWLockRelease(BufMgrLock);
***
*** 178,184 
  
/*
 * Convert shmem offsets into addresses as seen by this process. This
!* is just to speed up the BufferGetBlock() macro.
 */
for (i = 0; i < NBuffers; i++)
BufferBlockPointers[i] = (Block) 
MAKE_PTR(BufferDescriptors[i].data);
--- 180,187 
  
/*
 * Convert shmem offsets into addresses as seen by this process. This
!* is just to speed up the BufferGetBlock() macro.  It is OK to do this
!* without any lock since the data pointers never change.
 */
for (i = 0; i < NBuffers; i++)
BufferBlockPointers[i] = (Block) 
MAKE_PTR(BufferDescriptors[i].data);
***
*** 201,214 
/* size of data pages */
size += NBuffers * MAXALIGN(BLCKSZ);
  
!   /* size of buffer hash table */
!   size += hash_estimate_size(NBuffers * 2, sizeof(BufferLookupEnt));
! 
!   /* size of the shared replacement strategy control block */
!   size += MAXALIGN(sizeof(BufferStrategyControl));
! 
!   /* size of the ARC directory blocks */
!   size += MAXALIGN(NBuffers * 2 * sizeof(BufferStrategyCDB));
  
return size;
  }
--- 204,211 
/* size of data pages */
size += NBuffers * MAXALIGN(BLCKSZ);
  
!   /* size of stuff controlled by freelist.c */
!   size += StrategyShmemSize();
  
return size;
  }
*** src/backend/storage/buffer/buf_table.c.orig Fri Dec 31 17:46:08 2004
--- src/backend/storage/buffer/buf_table.c  Thu Feb  3 14:40:17 2005
***
*** 1,11 
  /*-
   *
   * buf_table.c
!  *  routines for finding buffers in the buffer pool.
   *
!  * NOTE: these days, what this table actually provides is a mapping from
!  * BufferTags to CDB indexes, not directly to buffers.The function 
names
!  * are thus slight misnomers.
   *
   * Note: all routines in this file assume that the BufMgrLock is held
   * by the caller, so no synchronization is needed.
--- 

Re: [PATCHES] dbsize patch

2005-02-03 Thread Ed L.
Neil, do you have a verdict on this patch?

On Friday January 28 2005 10:30, Ed L. wrote:
> If the C code for the prior dbsize patch is not acceptable for
> whatever reason, here's a SQL-based patch to replace it.  It's
> not a drop-in for 7.3/7.4 as I'd hoped, only an 8.1 patch.  I
> believe it is functionally equivalent to the C patch, but
> simpler, shorter, and probably a tad slower. I also removed
> the README section on how to aggregate since it was
> incomplete/incorrect (it didn't count toasted indices) and
> added a SQL function that itemizes the size for a relation's
> table and index data (helpful to us in identifying bloat,
> measuring performance, capacity estimation, etc).
>
> Ed


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] Refactoring lock.c

2005-02-03 Thread Heikki Linnakangas
On Thu, 3 Feb 2005, Neil Conway wrote:
On Wed, 2005-02-02 at 21:41 +0200, Heikki Linnakangas wrote:
There's two almost identical pieces of code in LockRelease and
LockReleaseAll that do the opposite of GrantLock.
Here's a small patch that replaces those pieces with a static UnGrantLock
function.
LockReleaseAll() did not update the holdMask bits for a released
proclock, but it will do so now. That's okay because we're removing the
proclock, right?
Right.
- Heikki
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+ memory leak)

2005-02-03 Thread ITAGAKI Takahiro
Hello everyone.

I fixed two bugs in the patch that I sent before.
Check and test new one, please.

1. Fix update timing of Write->curridx. (pointed by Tom)
 Change to update it soon after write().

2. Fix buffer alignment routine on 64bit cpu. (pointed by Mark)
 I checked it on Xeon EM64T and it worked properly, but I don't have IA64...


BTW, I found memory leak in BootStrapXLOG(). The buffer allocated by malloc()
is not free()ed. ISSUE_BOOTSTRAP_MEMORYLEAK in this patch points out it.
(But this leak is not serious, because this function is called only once.)


ITAGAKI Takahiro


xlog.c.diff
Description: Binary data

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly