[PATCHES] cost_nonsequential_access()

2004-06-08 Thread Manfred Koizar
The comment describing cost_nonsequential_access() says that the two
functions meet in the middle.  They meet at random_page_cost/2,
however, not in the middle between 1 and random_page_cost.  For
random_page_cost  2 the result can be less than 1 for relpages near
effective_cache_size.  I don't think that this was intended.

This patch makes sure that cost_nonsequential_access() is always between
1 and randon_page_cost and the functions meet a (1+random_page_cost)/2.

Servus
 Manfred
diff -Ncr ../base/src/backend/optimizer/path/costsize.c 
src/backend/optimizer/path/costsize.c
*** ../base/src/backend/optimizer/path/costsize.c   Tue Jun  1 05:02:52 2004
--- src/backend/optimizer/path/costsize.c   Tue Jun  8 08:34:27 2004
***
*** 189,199 
   * for now by assuming we are given an effective_cache_size parameter.
   *
   * Given a guesstimated cache size, we estimate the actual I/O cost per page
!  * with the entirely ad-hoc equations:
   *if relpages = effective_cache_size:
!  *random_page_cost * (1 - (effective_cache_size/relpages)/2)
   *if relpages  effective_cache_size:
!  *1 + (random_page_cost/2-1) * (relpages/effective_cache_size) ** 2
   * These give the right asymptotic behavior (= 1.0 as relpages becomes
   * small, = random_page_cost as it becomes large) and meet in the middle
   * with the estimate that the cache is about 50% effective for a relation
--- 189,200 
   * for now by assuming we are given an effective_cache_size parameter.
   *
   * Given a guesstimated cache size, we estimate the actual I/O cost per page
!  * with the entirely ad-hoc equations (writing rpc for random_page_cost and
!  * relsize for relpages/effective_cache_size):
   *if relpages = effective_cache_size:
!  *rpc - (rpc-1)/2 * 1/relsize
   *if relpages  effective_cache_size:
!  *1 + (rpc-1)/2 * relsize
   * These give the right asymptotic behavior (= 1.0 as relpages becomes
   * small, = random_page_cost as it becomes large) and meet in the middle
   * with the estimate that the cache is about 50% effective for a relation
***
*** 213,221 
relsize = relpages / effective_cache_size;
  
if (relsize = 1.0)
!   return random_page_cost * (1.0 - 0.5 / relsize);
else
!   return 1.0 + (random_page_cost * 0.5 - 1.0) * relsize * relsize;
  }
  
  /*
--- 214,222 
relsize = relpages / effective_cache_size;
  
if (relsize = 1.0)
!   return random_page_cost - (random_page_cost - 1.0) / 2.0 / relsize;
else
!   return 1.0 + (random_page_cost - 1.0) / 2.0 * relsize;
  }
  
  /*

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] cost_nonsequential_access()

2004-06-08 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 The comment describing cost_nonsequential_access() says that the two
 functions meet in the middle.  They meet at random_page_cost/2,
 however, not in the middle between 1 and random_page_cost.  For
 random_page_cost  2 the result can be less than 1 for relpages near
 effective_cache_size.  I don't think that this was intended.

You're right, I failed to consider that random_page_cost might be less
than 2.

 This patch makes sure that cost_nonsequential_access() is always between
 1 and randon_page_cost and the functions meet a (1+random_page_cost)/2.

This patch seems to do considerably more violence to the equations than
is needed to cover that oversight, though.  The old behavior was
intentionally nonlinear in relsize; this is not.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] cost_nonsequential_access()

2004-06-08 Thread Manfred Koizar
On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane [EMAIL PROTECTED] wrote:
This patch seems to do considerably more violence to the equations than
is needed to cover that oversight, though.  The old behavior was
intentionally nonlinear in relsize; this is not.

The comment says entirely ad-hoc and I didn't see a particular reason
why the lower half should be nonlinear in relsize while the upper half
is linear in 1/relsize.  So I opted for the more esthetic symmetrical
function.  :-)

http://www.pivot.at/pg/costsize.jpg original
http://www.pivot.at/pg/costsize_2.jpg nonlinear/linear
http://www.pivot.at/pg/costsize_3a.jpg nonlinear
http://www.pivot.at/pg/costsize_3b.jpg linear

I don't have a strong opinion for either relsize or relsize^2.  So
please add  * relsize or  / relsize as appropriate before you apply
(if you intend to apply).

Servus
 Manfred

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] cost_nonsequential_access()

2004-06-08 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 This patch seems to do considerably more violence to the equations than
 is needed to cover that oversight, though.  The old behavior was
 intentionally nonlinear in relsize; this is not.

 The comment says entirely ad-hoc and I didn't see a particular reason
 why the lower half should be nonlinear in relsize while the upper half
 is linear in 1/relsize.

Incremental changes in the relsize fraction are not going to change the
cost much except near 1, so I was after a curve that went like this
(pardon the crude artwork):


rpc***
***

  **
cost *
*
   *
 **
 
 
1.0 *
0   1large
  relsize fraction

I don't think replacing the lower half of this with a straight line
segment is an improvement.

Possibly the relsize axis ought to be measured on a log scale, or
something like that, but that didn't seem to work nicely when relsize
approaches zero.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] cost_nonsequential_access()

2004-06-08 Thread Manfred Koizar
On Tue, 08 Jun 2004 13:13:01 -0400, Tom Lane [EMAIL PROTECTED] wrote:
Possibly the relsize axis ought to be measured on a log scale, or
something like that, but that didn't seem to work nicely when relsize
approaches zero.

In my experiments I used log(relsize) on the x axis, and I don't think
that the graph looks unpleasant for small relsize.  My thought was (and
is) that we are much more interested in whether relpages is 1/100, 1/10,
1, 10, 100 times effective_cache_size than whether it is relpages +/-
1000, 2000, 3000, ...

I played around with some numbers that could be considered fairly
realistic.  You might want to look at the graphs I linked to in the
previous message or download http://www.pivot.at/pg/costsize.sxc.

But I think we are wasting too much effort.  The graphs don't look too
different, whether you use relsize or relsize^2.  Maybe relsize^3 is
optimal?  Nobody knows.  The important part of the patch is that the
result is scaled and shifted into the range 1 to random_page_cost.
Whatever you decide to do is ok with me.

Servus
 Manfred

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]