On Sat, Aug 16, 2014 at 4:23 PM, Sawada Masahiko <sawada.m...@gmail.com> wrote:
> On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
>> On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
>>> On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>>> Fujii Masao <masao.fu...@gmail.com> writes:
>>>>> On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>>>>> Should we try to install some hack around fastupdate for 9.4?  I fear
>>>>>> the divergence between reasonable values of work_mem and reasonable
>>>>>> sizes for that list is only going to continue to get bigger.  I'm sure
>>>>>> there's somebody out there who has work_mem = 16GB, and stuff like
>>>>>> 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
>>>>>> appeal of large values.
>>>>
>>>>> Controlling the threshold of the size of pending list only by GUC doesn't
>>>>> seem reasonable. Users may want to increase the threshold only for the
>>>>> GIN index which can be updated heavily, and decrease it otherwise. So
>>>>> I think that it's better to add new storage parameter for GIN index to 
>>>>> control
>>>>> the threshold, or both storage parameter and GUC.
>>>>
>>>> Yeah, -1 for a GUC.  A GIN-index-specific storage parameter seems more
>>>> appropriate.
>>>
>>> The attached patch introduces the GIN index storage parameter
>>> "PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
>>> GIN pending list. If it's not set, work_mem is used as that maximum size,
>>> instead. So this patch doesn't break the existing application which
>>> currently uses work_mem as the threshold of cleanup operation of
>>> the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.
>>>
>>> This is an index storage parameter, and allows us to specify each
>>> threshold per GIN index. So, for example, we can increase the threshold
>>> only for the GIN index which can be updated heavily, and decrease it 
>>> otherwise.
>>>
>>> This patch uses another patch that I proposed (*1) as an infrastructure.
>>> Please apply that infrastructure patch first if you apply this patch.
>>>
>>> (*1)
>>> http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com
>>>
>>> Regards,
>>>
>>> --
>>> Fujii Masao
>>
>> Sorry, I forgot to attached the patch.... This time, attached.
>>
>
> I think that this patch should be rebased.
> It contains garbage code.

Thanks for reviewing the patch! ISTM that I failed to make the patch from
my git repository... Attached is the rebased version.

Regards,

-- 
Fujii Masao
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
index 80a578d..24c8dc1 100644
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -728,8 +728,9 @@
    from the indexed item). As of <productname>PostgreSQL</productname> 8.4,
    <acronym>GIN</> is capable of postponing much of this work by inserting
    new tuples into a temporary, unsorted list of pending entries.
-   When the table is vacuumed, or if the pending list becomes too large
-   (larger than <xref linkend="guc-work-mem">), the entries are moved to the
+   When the table is vacuumed, or if the pending list becomes larger than
+   <literal>PENDING_LIST_CLEANUP_SIZE</literal> (or
+   <xref linkend="guc-work-mem"> if not set), the entries are moved to the
    main <acronym>GIN</acronym> data structure using the same bulk insert
    techniques used during initial index creation.  This greatly improves
    <acronym>GIN</acronym> index update speed, even counting the additional
@@ -812,18 +813,27 @@
   </varlistentry>
 
   <varlistentry>
-   <term><xref linkend="guc-work-mem"></term>
+   <term><literal>PENDING_LIST_CLEANUP_SIZE</> and
+   <xref linkend="guc-work-mem"></term>
    <listitem>
     <para>
      During a series of insertions into an existing <acronym>GIN</acronym>
      index that has <literal>FASTUPDATE</> enabled, the system will clean up
      the pending-entry list whenever the list grows larger than
-     <varname>work_mem</>.  To avoid fluctuations in observed response time,
-     it's desirable to have pending-list cleanup occur in the background
-     (i.e., via autovacuum).  Foreground cleanup operations can be avoided by
-     increasing <varname>work_mem</> or making autovacuum more aggressive.
-     However, enlarging <varname>work_mem</> means that if a foreground
-     cleanup does occur, it will take even longer.
+     <literal>PENDING_LIST_CLEANUP_SIZE</> (if not set, <varname>work_mem</>
+     is used as that threshold, instead). To avoid fluctuations in observed
+     response time, it's desirable to have pending-list cleanup occur in the
+     background (i.e., via autovacuum).  Foreground cleanup operations
+     can be avoided by increasing <literal>PENDING_LIST_CLEANUP_SIZE</>
+     (or <varname>work_mem</>) or making autovacuum more aggressive.
+     However, enlarging the threshold of the cleanup operation means that
+     if a foreground cleanup does occur, it will take even longer.
+    </para>
+    <para>
+     <literal>PENDING_LIST_CLEANUP_SIZE</> is an index storage parameter,
+     and allows each GIN index to have its own cleanup threshold.
+     For example, it's possible to increase the threshold only for the GIN
+     index which can be updated heavily, and decrease it otherwise.
     </para>
    </listitem>
   </varlistentry>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index e469b17..1589812 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -356,6 +356,22 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable class="parameter">name</
     </listitem>
    </varlistentry>
    </variablelist>
+   <variablelist>
+   <varlistentry>
+    <term><literal>PENDING_LIST_CLEANUP_SIZE</></term>
+    <listitem>
+    <para>
+     This setting specifies the maximum size of the GIN pending list which is
+     used when <literal>FASTUPDATE</> is enabled. If the list grows larger than
+     this maximum size, it is cleaned up by moving the entries in it to the
+     main GIN data structure in bulk. The value is specified in kilobytes.
+     If this is not set, <literal>work_mem</> is used as the maximum size
+     of the pending list, instead. See <xref linkend="gin-fast-update"> and
+     <xref linkend="gin-tips"> for more information.
+    </para>
+    </listitem>
+   </varlistentry>
+   </variablelist>
   </refsect2>
 
   <refsect2 id="SQL-CREATEINDEX-CONCURRENTLY">
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 9a03fdc..6ffe7e8 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -209,6 +209,14 @@ static relopt_int intRelOpts[] =
 			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
 		}, -1, 0, 2000000000, 0
 	},
+	{
+		{
+			"pending_list_cleanup_size",
+			"Maximum size of the pending list for this GIN index, in kilobytes.",
+			RELOPT_KIND_GIN
+		},
+		-1, 0, MAX_KILOBYTES, GUC_UNIT_KB
+	},
 
 	/* list terminator */
 	{{NULL}}
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 09c3e39..230ef17 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -227,6 +227,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 	ginxlogUpdateMeta data;
 	bool		separateList = false;
 	bool		needCleanup = false;
+	int			cleanupSize;
 
 	if (collector->ntuples == 0)
 		return;
@@ -421,11 +422,15 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 	 * ginInsertCleanup could take significant amount of time, so we prefer to
 	 * call it when it can do all the work in a single collection cycle. In
 	 * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it
-	 * while pending list is still small enough to fit into work_mem.
+	 * while pending list is still small enough to fit into
+	 * pending_list_cleanup_size (or work_mem if not set).
 	 *
 	 * ginInsertCleanup() should not be called inside our CRIT_SECTION.
 	 */
-	if (metadata->nPendingPages * GIN_PAGE_FREESIZE > work_mem * 1024L)
+	cleanupSize = GinGetPendingListCleanupSize(index);
+	if (cleanupSize == GIN_DEFAULT_PENDING_LIST_CLEANUP_SIZE)
+		cleanupSize = work_mem;
+	if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L)
 		needCleanup = true;
 
 	UnlockReleaseBuffer(metabuffer);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 3ca0b68..a1e959a 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -524,7 +524,9 @@ ginoptions(PG_FUNCTION_ARGS)
 	GinOptions *rdopts;
 	int			numoptions;
 	static const relopt_parse_elt tab[] = {
-		{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)}
+		{"fastupdate", RELOPT_TYPE_BOOL, offsetof(GinOptions, useFastUpdate)},
+		{"pending_list_cleanup_size", RELOPT_TYPE_INT, offsetof(GinOptions,
+																pendingListCleanupSize)}
 	};
 
 	options = parseRelOptions(reloptions, validate, RELOPT_KIND_GIN,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 60e4354..973127e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -95,14 +95,6 @@
 #define CONFIG_EXEC_PARAMS_NEW "global/config_exec_params.new"
 #endif
 
-/* upper limit for GUC variables measured in kilobytes of memory */
-/* note that various places assume the byte size fits in a "long" variable */
-#if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
-#define MAX_KILOBYTES	INT_MAX
-#else
-#define MAX_KILOBYTES	(INT_MAX / 1024)
-#endif
-
 #define KB_PER_MB (1024)
 #define KB_PER_GB (1024*1024)
 #define KB_PER_TB (1024*1024*1024)
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index b4f1856..543ad38 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1137,7 +1137,7 @@ psql_completion(const char *text, int start, int end)
 			 pg_strcasecmp(prev_wd, "(") == 0)
 	{
 		static const char *const list_INDEXOPTIONS[] =
-		{"fillfactor", "fastupdate", NULL};
+		{"fillfactor", "fastupdate", "pending_list_cleanup_size", NULL};
 
 		COMPLETE_WITH_LIST(list_INDEXOPTIONS);
 	}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3baa9f5..ad0777d 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -314,12 +314,18 @@ typedef struct GinOptions
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	bool		useFastUpdate;	/* use fast updates? */
+	int			pendingListCleanupSize;	/* maximum size of pending list */
 } GinOptions;
 
 #define GIN_DEFAULT_USE_FASTUPDATE	true
 #define GinGetUseFastUpdate(relation) \
 	((relation)->rd_options ? \
 	 ((GinOptions *) (relation)->rd_options)->useFastUpdate : GIN_DEFAULT_USE_FASTUPDATE)
+#define GIN_DEFAULT_PENDING_LIST_CLEANUP_SIZE	-1
+#define GinGetPendingListCleanupSize(relation) \
+	((relation)->rd_options ? \
+	 ((GinOptions *) (relation)->rd_options)->pendingListCleanupSize : \
+	 GIN_DEFAULT_PENDING_LIST_CLEANUP_SIZE)
 
 
 /* Macros for buffer lock/unlock operations */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 0a729c1..5b5bd77 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -18,6 +18,14 @@
 #include "utils/array.h"
 
 
+/* upper limit for GUC variables measured in kilobytes of memory */
+/* note that various places assume the byte size fits in a "long" variable */
+#if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
+#define MAX_KILOBYTES	INT_MAX
+#else
+#define MAX_KILOBYTES	(INT_MAX / 1024)
+#endif
+
 /*
  * Certain options can only be set at certain times. The rules are
  * like this:
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to