On Tue, Nov 4, 2014 at 12:04 PM, Etsuro Fujita
<fujita.ets...@lab.ntt.co.jp> wrote:
> IIUC, I think that min = 0 disables fast update, so ISTM that it'd be
> appropriate to set min to some positive value.  And ISTM that the idea of
> using the min value of work_mem is not so bad.

OK. I changed the min value to 64kB.

> *** 356,361 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable
> class="parameter">name</
> --- 356,372 ----
>       </listitem>
>      </varlistentry>
>      </variablelist>
> +    <variablelist>
> +    <varlistentry>
> +     <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>
>
> The above is still in uppercse.

Fixed.

Attached is the updated version of the patch. Thanks for the review!

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 5911,5916 **** SET XML OPTION { DOCUMENT | CONTENT };
--- 5911,5937 ----
        </listitem>
       </varlistentry>
  
+      <varlistentry id="guc-pending-list-cleanup-size" xreflabel="pending_list_cleanup_size">
+       <term><varname>pending_list_cleanup_size</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>pending_list_cleanup_size</> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum size of the GIN pending list which is used
+         when <literal>fastupdate</> is enabled. If the list grows
+         larger than this maximum size, it is cleaned up by moving
+         the entries in it to the main GIN data structure in bulk.
+         The default is four megabytes (<literal>4MB</>). This setting
+         can be overridden for individual GIN indexes by changing
+         storage parameters.
+          See <xref linkend="gin-fast-update"> and <xref linkend="gin-tips">
+          for more information.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
       </variablelist>
      </sect2>
       <sect2 id="runtime-config-client-format">
*** a/doc/src/sgml/gin.sgml
--- b/doc/src/sgml/gin.sgml
***************
*** 728,735 ****
     from the indexed item). As of <productname>PostgreSQL</productname> 8.4,
     <acronym>GIN</> is capable of postponing much of this work by inserting
     new tuples into a temporary, unsorted list of pending entries.
!    When the table is vacuumed, or if the pending list becomes too large
!    (larger than <xref linkend="guc-work-mem">), the entries are moved to the
     main <acronym>GIN</acronym> data structure using the same bulk insert
     techniques used during initial index creation.  This greatly improves
     <acronym>GIN</acronym> index update speed, even counting the additional
--- 728,735 ----
     from the indexed item). As of <productname>PostgreSQL</productname> 8.4,
     <acronym>GIN</> is capable of postponing much of this work by inserting
     new tuples into a temporary, unsorted list of pending entries.
!    When the table is vacuumed, or if the pending list becomes larger than
!    <xref linkend="guc-pending-list-cleanup-size">, the entries are moved to the
     main <acronym>GIN</acronym> data structure using the same bulk insert
     techniques used during initial index creation.  This greatly improves
     <acronym>GIN</acronym> index update speed, even counting the additional
***************
*** 750,756 ****
    <para>
     If consistent response time is more important than update speed,
     use of pending entries can be disabled by turning off the
!    <literal>FASTUPDATE</literal> storage parameter for a
     <acronym>GIN</acronym> index.  See <xref linkend="sql-createindex">
     for details.
    </para>
--- 750,756 ----
    <para>
     If consistent response time is more important than update speed,
     use of pending entries can be disabled by turning off the
!    <literal>fastupdate</literal> storage parameter for a
     <acronym>GIN</acronym> index.  See <xref linkend="sql-createindex">
     for details.
    </para>
***************
*** 812,829 ****
    </varlistentry>
  
    <varlistentry>
!    <term><xref linkend="guc-work-mem"></term>
     <listitem>
      <para>
       During a series of insertions into an existing <acronym>GIN</acronym>
!      index that has <literal>FASTUPDATE</> enabled, the system will clean up
       the pending-entry list whenever the list grows larger than
!      <varname>work_mem</>.  To avoid fluctuations in observed response time,
!      it's desirable to have pending-list cleanup occur in the background
!      (i.e., via autovacuum).  Foreground cleanup operations can be avoided by
!      increasing <varname>work_mem</> or making autovacuum more aggressive.
!      However, enlarging <varname>work_mem</> means that if a foreground
!      cleanup does occur, it will take even longer.
      </para>
     </listitem>
    </varlistentry>
--- 812,837 ----
    </varlistentry>
  
    <varlistentry>
!    <term><xref linkend="guc-pending-list-cleanup-size"></term>
     <listitem>
      <para>
       During a series of insertions into an existing <acronym>GIN</acronym>
!      index that has <literal>fastupdate</> enabled, the system will clean up
       the pending-entry list whenever the list grows larger than
!      <varname>pending_list_cleanup_size</>. To avoid fluctuations in observed
!      response time, it's desirable to have pending-list cleanup occur in the
!      background (i.e., via autovacuum).  Foreground cleanup operations
!      can be avoided by increasing <varname>pending_list_cleanup_size</>
!      or making autovacuum more aggressive.
!      However, enlarging the threshold of the cleanup operation means that
!      if a foreground cleanup does occur, it will take even longer.
!     </para>
!     <para>
!      <varname>pending_list_cleanup_size</> can be overridden for individual
!      GIN indexes by changing storage parameters, and which allows each
!      GIN index to have its own cleanup threshold.
!      For example, it's possible to increase the threshold only for the GIN
!      index which can be updated heavily, and decrease it otherwise.
      </para>
     </listitem>
    </varlistentry>
*** a/doc/src/sgml/gist.sgml
--- b/doc/src/sgml/gist.sgml
***************
*** 861,867 **** my_distance(PG_FUNCTION_ARGS)
    <para>
     By default, a GiST index build switches to the buffering method when the
     index size reaches <xref linkend="guc-effective-cache-size">. It can
!    be manually turned on or off by the <literal>BUFFERING</literal> parameter
     to the CREATE INDEX command. The default behavior is good for most cases,
     but turning buffering off might speed up the build somewhat if the input
     data is ordered.
--- 861,867 ----
    <para>
     By default, a GiST index build switches to the buffering method when the
     index size reaches <xref linkend="guc-effective-cache-size">. It can
!    be manually turned on or off by the <literal>buffering</literal> parameter
     to the CREATE INDEX command. The default behavior is good for most cases,
     but turning buffering off might speed up the build somewhat if the input
     data is ordered.
*** a/doc/src/sgml/ref/cluster.sgml
--- b/doc/src/sgml/ref/cluster.sgml
***************
*** 46,52 **** CLUSTER [VERBOSE]
     not clustered.  That is, no attempt is made to store new or
     updated rows according to their index order.  (If one wishes, one can
     periodically recluster by issuing the command again.  Also, setting
!    the table's <literal>FILLFACTOR</literal> storage parameter to less than
     100% can aid in preserving cluster ordering during updates, since updated
     rows are kept on the same page if enough space is available there.)
    </para>
--- 46,52 ----
     not clustered.  That is, no attempt is made to store new or
     updated rows according to their index order.  (If one wishes, one can
     periodically recluster by issuing the command again.  Also, setting
!    the table's <literal>fillfactor</literal> storage parameter to less than
     100% can aid in preserving cluster ordering during updates, since updated
     rows are kept on the same page if enough space is available there.)
    </para>
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
***************
*** 299,305 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
  
     <variablelist>
     <varlistentry>
!     <term><literal>FILLFACTOR</></term>
      <listitem>
       <para>
        The fillfactor for an index is a percentage that determines how full
--- 299,305 ----
  
     <variablelist>
     <varlistentry>
!     <term><literal>fillfactor</></term>
      <listitem>
       <para>
        The fillfactor for an index is a percentage that determines how full
***************
*** 326,332 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
  
     <variablelist>
     <varlistentry>
!     <term><literal>BUFFERING</></term>
      <listitem>
      <para>
       Determines whether the buffering build technique described in
--- 326,332 ----
  
     <variablelist>
     <varlistentry>
!     <term><literal>buffering</></term>
      <listitem>
      <para>
       Determines whether the buffering build technique described in
***************
*** 340,351 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     </variablelist>
  
     <para>
!     GIN indexes accept a different parameter:
     </para>
  
     <variablelist>
     <varlistentry>
!     <term><literal>FASTUPDATE</></term>
      <listitem>
      <para>
       This setting controls usage of the fast update technique described in
--- 340,351 ----
     </variablelist>
  
     <para>
!     GIN indexes accept different parameters:
     </para>
  
     <variablelist>
     <varlistentry>
!     <term><literal>fastupdate</></term>
      <listitem>
      <para>
       This setting controls usage of the fast update technique described in
***************
*** 358,364 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
  
      <note>
       <para>
!       Turning <literal>FASTUPDATE</> off via <command>ALTER INDEX</> prevents
        future insertions from going into the list of pending index entries,
        but does not in itself flush previous entries.  You might want to
        <command>VACUUM</> the table afterward to ensure the pending list is
--- 358,364 ----
  
      <note>
       <para>
!       Turning <literal>fastupdate</> off via <command>ALTER INDEX</> prevents
        future insertions from going into the list of pending index entries,
        but does not in itself flush previous entries.  You might want to
        <command>VACUUM</> the table afterward to ensure the pending list is
***************
*** 368,373 **** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
--- 368,384 ----
      </listitem>
     </varlistentry>
     </variablelist>
+    <variablelist>
+    <varlistentry>
+     <term><literal>pending_list_cleanup_size</></term>
+     <listitem>
+     <para>
+      Custom <xref linkend="guc-pending-list-cleanup-size"> parameter.
+      This value is specified in kilobytes.
+     </para>
+     </listitem>
+    </varlistentry>
+    </variablelist>
    </refsect2>
  
    <refsect2 id="SQL-CREATEINDEX-CONCURRENTLY">
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***************
*** 209,214 **** static relopt_int intRelOpts[] =
--- 209,222 ----
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		}, -1, 0, 2000000000
  	},
+ 	{
+ 		{
+ 			"pending_list_cleanup_size",
+ 			"Maximum size of the pending list for this GIN index, in kilobytes.",
+ 			RELOPT_KIND_GIN
+ 		},
+ 		-1, 64, MAX_KILOBYTES
+ 	},
  
  	/* list terminator */
  	{{NULL}}
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 25,30 ****
--- 25,32 ----
  #include "utils/memutils.h"
  #include "utils/rel.h"
  
+ /* GUC parameter */
+ int			pending_list_cleanup_size = 0;
  
  #define GIN_PAGE_FREESIZE \
  	( BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - MAXALIGN(sizeof(GinPageOpaqueData)) )
***************
*** 228,233 **** ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
--- 230,236 ----
  	ginxlogUpdateMeta data;
  	bool		separateList = false;
  	bool		needCleanup = false;
+ 	int			cleanupSize;
  
  	if (collector->ntuples == 0)
  		return;
***************
*** 422,432 **** ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
  	 * ginInsertCleanup could take significant amount of time, so we prefer to
  	 * call it when it can do all the work in a single collection cycle. In
  	 * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it
! 	 * while pending list is still small enough to fit into work_mem.
  	 *
  	 * ginInsertCleanup() should not be called inside our CRIT_SECTION.
  	 */
! 	if (metadata->nPendingPages * GIN_PAGE_FREESIZE > work_mem * 1024L)
  		needCleanup = true;
  
  	UnlockReleaseBuffer(metabuffer);
--- 425,437 ----
  	 * ginInsertCleanup could take significant amount of time, so we prefer to
  	 * call it when it can do all the work in a single collection cycle. In
  	 * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it
! 	 * while pending list is still small enough to fit into
! 	 * pending_list_cleanup_size.
  	 *
  	 * ginInsertCleanup() should not be called inside our CRIT_SECTION.
  	 */
! 	cleanupSize = GinGetPendingListCleanupSize(index);
! 	if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L)
  		needCleanup = true;
  
  	UnlockReleaseBuffer(metabuffer);
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
***************
*** 525,531 **** ginoptions(PG_FUNCTION_ARGS)
  	GinOptions *rdopts;
  	int			numoptions;
  	static const relopt_parse_elt tab[] = {
! 		{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)}
  	};
  
  	options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
--- 525,533 ----
  	GinOptions *rdopts;
  	int			numoptions;
  	static const relopt_parse_elt tab[] = {
! 		{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
! 		{"pending_list_cleanup_size", RELOPT_TYPE_INT, offsetof(GinOptions,
! 																pendingListCleanupSize)}
  	};
  
  	options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 96,109 ****
  #define CONFIG_EXEC_PARAMS_NEW "global/config_exec_params.new"
  #endif
  
- /* upper limit for GUC variables measured in kilobytes of memory */
- /* note that various places assume the byte size fits in a "long" variable */
- #if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
- #define MAX_KILOBYTES	INT_MAX
- #else
- #define MAX_KILOBYTES	(INT_MAX / 1024)
- #endif
- 
  #define KB_PER_MB (1024)
  #define KB_PER_GB (1024*1024)
  #define KB_PER_TB (1024*1024*1024)
--- 96,101 ----
***************
*** 2550,2555 **** static struct config_int ConfigureNamesInt[] =
--- 2542,2558 ----
  		NULL, NULL, NULL
  	},
  
+ 	{
+ 		{"pending_list_cleanup_size", PGC_USERSET, CLIENT_CONN_STATEMENT,
+ 			gettext_noop("Sets the maximum size of the pending list for GIN index."),
+ 			 NULL,
+ 			GUC_UNIT_KB
+ 		},
+ 		&pending_list_cleanup_size,
+ 		4096, 64, MAX_KILOBYTES,
+ 		NULL, NULL, NULL
+ 	},
+ 
  	/* End-of-list marker */
  	{
  		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 519,524 ****
--- 519,525 ----
  #bytea_output = 'hex'			# hex, escape
  #xmlbinary = 'base64'
  #xmloption = 'content'
+ #pending_list_cleanup_size = 4MB
  
  # - Locale and Formatting -
  
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***************
*** 1172,1178 **** psql_completion(const char *text, int start, int end)
  			 pg_strcasecmp(prev_wd, "(") == 0)
  	{
  		static const char *const list_INDEXOPTIONS[] =
! 		{"fillfactor", "fastupdate", NULL};
  
  		COMPLETE_WITH_LIST(list_INDEXOPTIONS);
  	}
--- 1172,1178 ----
  			 pg_strcasecmp(prev_wd, "(") == 0)
  	{
  		static const char *const list_INDEXOPTIONS[] =
! 		{"fillfactor", "fastupdate", "pending_list_cleanup_size", NULL};
  
  		COMPLETE_WITH_LIST(list_INDEXOPTIONS);
  	}
*** a/src/include/access/gin.h
--- b/src/include/access/gin.h
***************
*** 65,72 **** typedef char GinTernaryValue;
  #define GinTernaryValueGetDatum(X) ((Datum)(X))
  #define PG_RETURN_GIN_TERNARY_VALUE(x) return GinTernaryValueGetDatum(x)
  
! /* GUC parameter */
  extern PGDLLIMPORT int GinFuzzySearchLimit;
  
  /* ginutil.c */
  extern void ginGetStats(Relation index, GinStatsData *stats);
--- 65,73 ----
  #define GinTernaryValueGetDatum(X) ((Datum)(X))
  #define PG_RETURN_GIN_TERNARY_VALUE(x) return GinTernaryValueGetDatum(x)
  
! /* GUC parameters */
  extern PGDLLIMPORT int GinFuzzySearchLimit;
+ extern int pending_list_cleanup_size;
  
  /* ginutil.c */
  extern void ginGetStats(Relation index, GinStatsData *stats);
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
***************
*** 315,326 **** typedef struct GinOptions
--- 315,332 ----
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
  	bool		useFastUpdate;	/* use fast updates? */
+ 	int			pendingListCleanupSize;	/* maximum size of pending list */
  } GinOptions;
  
  #define GIN_DEFAULT_USE_FASTUPDATE	true
  #define GinGetUseFastUpdate(relation) \
  	((relation)->rd_options ? \
  	 ((GinOptions *) (relation)->rd_options)->useFastUpdate : GIN_DEFAULT_USE_FASTUPDATE)
+ #define GinGetPendingListCleanupSize(relation) \
+ 	((relation)->rd_options && \
+ 	 ((GinOptions *) (relation)->rd_options)->pendingListCleanupSize != -1 ? \
+ 	 ((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
+ 	 pending_list_cleanup_size)
  
  
  /* Macros for buffer lock/unlock operations */
*** a/src/include/utils/guc.h
--- b/src/include/utils/guc.h
***************
*** 18,23 ****
--- 18,31 ----
  #include "utils/array.h"
  
  
+ /* upper limit for GUC variables measured in kilobytes of memory */
+ /* note that various places assume the byte size fits in a "long" variable */
+ #if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
+ #define MAX_KILOBYTES	INT_MAX
+ #else
+ #define MAX_KILOBYTES	(INT_MAX / 1024)
+ #endif
+ 
  /*
   * Certain options can only be set at certain times. The rules are
   * like this:
*** a/src/test/regress/expected/create_index.out
--- b/src/test/regress/expected/create_index.out
***************
*** 2241,2246 **** SELECT COUNT(*) FROM array_gin_test WHERE a @> '{2}';
--- 2241,2259 ----
  
  DROP TABLE array_gin_test;
  --
+ -- Test GIN index's reloptions
+ --
+ CREATE INDEX gin_relopts_test ON array_index_op_test USING gin (i)
+   WITH (FASTUPDATE=on, PENDING_LIST_CLEANUP_SIZE=128);
+ \d+ gin_relopts_test
+      Index "public.gin_relopts_test"
+  Column |  Type   | Definition | Storage 
+ --------+---------+------------+---------
+  i      | integer | i          | plain
+ gin, for table "public.array_index_op_test"
+ Options: fastupdate=on, pending_list_cleanup_size=128
+ 
+ --
  -- HASH
  --
  CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
*** a/src/test/regress/sql/create_index.sql
--- b/src/test/regress/sql/create_index.sql
***************
*** 655,660 **** SELECT COUNT(*) FROM array_gin_test WHERE a @> '{2}';
--- 655,667 ----
  DROP TABLE array_gin_test;
  
  --
+ -- Test GIN index's reloptions
+ --
+ CREATE INDEX gin_relopts_test ON array_index_op_test USING gin (i)
+   WITH (FASTUPDATE=on, PENDING_LIST_CLEANUP_SIZE=128);
+ \d+ gin_relopts_test
+ 
+ --
  -- HASH
  --
  CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to