On Tue, Jan  8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote:
> 
> On 01/08/2013 08:08 PM, Tom Lane wrote:
> >Robert Haas <robertmh...@gmail.com> writes:
> >>On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >>>...  And I don't especially like the idea of trying to
> >>>make it depend directly on the box's physical RAM, for the same
> >>>practical reasons Robert mentioned.
> >>For the record, I don't believe those problems would be particularly
> >>hard to solve.
> >Well, the problem of "find out the box's physical RAM" is doubtless
> >solvable if we're willing to put enough sweat and tears into it, but
> >I'm dubious that it's worth the trouble.  The harder part is how to know
> >if the box is supposed to be dedicated to the database.  Bear in mind
> >that the starting point of this debate was the idea that we're talking
> >about an inexperienced DBA who doesn't know about any configuration knob
> >we might provide for the purpose.
> >
> >I'd prefer to go with a default that's predictable and not totally
> >foolish --- and some multiple of shared_buffers seems like it'd fit the
> >bill.
> 
> +1. That seems to be by far the biggest bang for the buck. Anything
> else will surely involve a lot more code for not much more benefit.

I have developed the attached patch which implements an auto-tuned
effective_cache_size which is 4x the size of shared buffers.  I had to
set effective_cache_size to its old 128MB default so the EXPLAIN
regression tests would pass unchanged.

I considered a new available_ram variable but that just gives us another
variable, and in a way shared_buffers is a fixed amount, while
effective_cache_size is an estimate, so I thought driving everything
from shared_buffers made sense.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 23ebc11..de2374b
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** include 'filename'
*** 2758,2764 ****
  
         <para>
          Random access to mechanical disk storage is normally much more expensive
!         than four-times sequential access.  However, a lower default is used
          (4.0) because the majority of random accesses to disk, such as indexed
          reads, are assumed to be in cache.  The default value can be thought of
          as modeling random access as 40 times slower than sequential, while
--- 2758,2764 ----
  
         <para>
          Random access to mechanical disk storage is normally much more expensive
!         than four times sequential access.  However, a lower default is used
          (4.0) because the majority of random accesses to disk, such as indexed
          reads, are assumed to be in cache.  The default value can be thought of
          as modeling random access as 40 times slower than sequential, while
*************** include 'filename'
*** 2841,2849 ****
        <listitem>
         <para>
          Sets the planner's assumption about the effective size of the
!         disk cache that is available to a single query.  This is
!         factored into estimates of the cost of using an index; a
!         higher value makes it more likely index scans will be used, a
          lower value makes it more likely sequential scans will be
          used. When setting this parameter you should consider both
          <productname>PostgreSQL</productname>'s shared buffers and the
--- 2841,2857 ----
        <listitem>
         <para>
          Sets the planner's assumption about the effective size of the
!         disk cache that is available to a single query.  The default
!         setting of -1 selects a size equal to four times the size of <xref
!         linkend="guc-shared-buffers">, but not less than the size of one
!         shared buffer page, typically <literal>8kB</literal>.  This value
!         can be set manually if the automatic choice is too large or too
!         small.
!        </para>
! 
!        <para>
!         This value is factored into estimates of the cost of using an index;
!         a higher value makes it more likely index scans will be used, a
          lower value makes it more likely sequential scans will be
          used. When setting this parameter you should consider both
          <productname>PostgreSQL</productname>'s shared buffers and the
*************** include 'filename'
*** 2855,2862 ****
          memory allocated by <productname>PostgreSQL</productname>, nor
          does it reserve kernel disk cache; it is used only for estimation
          purposes.  The system also does not assume data remains in
!         the disk cache between queries.  The default is 128 megabytes
!         (<literal>128MB</>).
         </para>
        </listitem>
       </varlistentry>
--- 2863,2872 ----
          memory allocated by <productname>PostgreSQL</productname>, nor
          does it reserve kernel disk cache; it is used only for estimation
          purposes.  The system also does not assume data remains in
!         the disk cache between queries.  The auto-tuning
!         selected by the default setting of -1 should give reasonable
!         results if this database cluster is can utilize most of the memory
!         on this server.
         </para>
        </listitem>
       </varlistentry>
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
new file mode 100644
index 0caac75..82c66f7
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
***************
*** 87,92 ****
--- 87,93 ----
  #include "optimizer/planmain.h"
  #include "optimizer/restrictinfo.h"
  #include "parser/parsetree.h"
+ #include "utils/guc.h"
  #include "utils/lsyscache.h"
  #include "utils/selfuncs.h"
  #include "utils/spccache.h"
***************
*** 95,108 ****
  
  #define LOG2(x)  (log(x) / 0.693147180559945)
  
- 
  double		seq_page_cost = DEFAULT_SEQ_PAGE_COST;
  double		random_page_cost = DEFAULT_RANDOM_PAGE_COST;
  double		cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
  double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
  double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
  
! int			effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE;
  
  Cost		disable_cost = 1.0e10;
  
--- 96,108 ----
  
  #define LOG2(x)  (log(x) / 0.693147180559945)
  
  double		seq_page_cost = DEFAULT_SEQ_PAGE_COST;
  double		random_page_cost = DEFAULT_RANDOM_PAGE_COST;
  double		cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
  double		cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
  double		cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
  
! int			effective_cache_size = -1;
  
  Cost		disable_cost = 1.0e10;
  
*************** cost_index(IndexPath *path, PlannerInfo
*** 456,461 ****
--- 456,507 ----
  	path->path.total_cost = startup_cost + run_cost;
  }
  
+ void
+ set_default_effective_cache_size(void)
+ {
+ 	/*
+ 	 * If the value of effective_cache_size is -1, use the preferred
+ 	 * auto-tune value.
+ 	 */
+ 	if (effective_cache_size == -1)
+ 	{
+ 		char		buf[32];
+ 
+ 		snprintf(buf, sizeof(buf), "%d", NBuffers * DEFAULT_EFFECTIVE_CACHE_SIZE_MULTI);
+ 		SetConfigOption("effective_cache_size", buf, PGC_POSTMASTER, PGC_S_OVERRIDE);
+ 	}
+ 	Assert(effective_cache_size > 0);
+ }
+ 
+ /*
+  * GUC check_hook for effective_cache_size
+  */
+ bool
+ check_effective_cache_size(int *newval, void **extra, GucSource source)
+ {
+ 	/*
+ 	 * -1 indicates a request for auto-tune.
+ 	 */
+ 	if (*newval == -1)
+ 	{
+ 		/*
+ 		 * If we haven't yet changed the boot_val default of -1, just let it
+ 		 * be.	We'll fix it in index_pages_fetched
+ 		 */
+ 		if (effective_cache_size == -1)
+ 			return true;
+ 
+ 		/* Otherwise, substitute the auto-tune value */
+ 		*newval = NBuffers * DEFAULT_EFFECTIVE_CACHE_SIZE_MULTI;
+ 	}
+ 
+ 	/* set minimum? */
+ 	if (*newval < 1)
+ 		*newval = 1;
+ 
+ 	return true;
+ }
+ 
  /*
   * index_pages_fetched
   *	  Estimate the number of pages actually fetched after accounting for
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
new file mode 100644
index 01d2618..56f0405
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 118,123 ****
--- 118,124 ----
  #include "utils/builtins.h"
  #include "utils/datetime.h"
  #include "utils/dynamic_loader.h"
+ #include "utils/guc.h"
  #include "utils/memutils.h"
  #include "utils/ps_status.h"
  #include "utils/timeout.h"
*************** SubPostmasterMain(int argc, char *argv[]
*** 4470,4475 ****
--- 4471,4478 ----
  	memset(&port, 0, sizeof(Port));
  	read_backend_variables(argv[2], &port);
  
+ 	set_default_effective_cache_size();
+ 
  	/*
  	 * Set reference point for stack-depth checking
  	 */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 7d297bc..7705430
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_int ConfigureNamesI
*** 2398,2405 ****
  			GUC_UNIT_BLOCKS,
  		},
  		&effective_cache_size,
! 		DEFAULT_EFFECTIVE_CACHE_SIZE, 1, INT_MAX,
! 		NULL, NULL, NULL
  	},
  
  	{
--- 2398,2405 ----
  			GUC_UNIT_BLOCKS,
  		},
  		&effective_cache_size,
! 		-1, -1, INT_MAX,
! 		check_effective_cache_size, NULL, NULL
  	},
  
  	{
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
new file mode 100644
index a052944..444ab74
*** a/src/include/optimizer/cost.h
--- b/src/include/optimizer/cost.h
***************
*** 27,33 ****
  #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005
  #define DEFAULT_CPU_OPERATOR_COST  0.0025
  
! #define DEFAULT_EFFECTIVE_CACHE_SIZE  16384		/* measured in pages */
  
  typedef enum
  {
--- 27,33 ----
  #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005
  #define DEFAULT_CPU_OPERATOR_COST  0.0025
  
! #define DEFAULT_EFFECTIVE_CACHE_SIZE_MULTI 4
  
  typedef enum
  {
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
new file mode 100644
index 99211c1..f5050ec
*** a/src/include/utils/guc.h
--- b/src/include/utils/guc.h
*************** extern void assign_search_path(const cha
*** 386,391 ****
--- 386,393 ----
  
  /* in access/transam/xlog.c */
  extern bool check_wal_buffers(int *newval, void **extra, GucSource source);
+ extern bool check_effective_cache_size(int *newval, void **extra, GucSource source);
+ extern void set_default_effective_cache_size(void);
  extern void assign_xlog_sync_method(int new_sync_method, void *extra);
  
  #endif   /* GUC_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
new file mode 100644
index c94ac61..d46ff26
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** where thousand = (q1 + q2);
*** 2713,2718 ****
--- 2713,2719 ----
  --
  -- test placement of movable quals in a parameterized join tree
  --
+ set effective_cache_size = '128MB';
  explain (costs off)
  select * from tenk1 t1 left join
    (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
new file mode 100644
index 351400f..1a4dc03
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** where thousand = (q1 + q2);
*** 711,716 ****
--- 711,718 ----
  -- test placement of movable quals in a parameterized join tree
  --
  
+ set effective_cache_size = '128MB';
+ 
  explain (costs off)
  select * from tenk1 t1 left join
    (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to