On 24/06/18 13:22, Adrien Nayrat wrote:
> Attached third version of the patch with documentation.

Hi.  I'm reviewing this.

>               exceeded = (log_min_duration_statement == 0 ||
>                                       (log_min_duration_statement > 0 &&
>                                        (secs > log_min_duration_statement / 
> 1000 ||
> -                                       secs * 1000 + msecs >= 
> log_min_duration_statement)));
> +                                       secs * 1000 + msecs >= 
> log_min_duration_statement))) &&
> +                                log_sample_rate != 0 && (log_sample_rate == 
> 1 ||
> +                                random() <= log_sample_rate * 
> MAX_RANDOM_VALUE);

A few notes on this part, which is the crux of the patch.

1) Is there an off-by-one error here?  drandom() has the formula

    (double) random() / ((double) MAX_RANDOM_VALUE + 1)

but yours doesn't look like that.

2) I think the whole thing should be separated into its own expression
instead of tagging along on an already hard to read expression.

3) Is it intentional to only sample with log_min_duration_statement and
not also with log_duration?  It seems like it should affect both.  In
both cases, the name is too generic.  Something called "log_sample_rate"
should sample **everything**.

Otherwise I think it's good.  Attached is a revised patch that fixes 2)
as well as some wordsmithing throughout.  It does not deal with the
other issues I raised.

Marked "Waiting on Author".
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7bfbc87109..90bbe6d423 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5190,6 +5190,26 @@ local0.*    /var/log/postgresql
        </listitem>
       </varlistentry>
 
+     <varlistentry id="guc-log-sample-rate" xreflabel="log_sample_rate">
+      <term><varname>log_sample_rate</varname> (<type>real</type>)
+      <indexterm>
+       <primary><varname>log_sample_rate</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+       <listitem>
+        <para>
+         Causes logging only a fraction of the statements when <xref
+         linkend="log_min_duration_statement"/> is used. The default is
+         <literal>1</literal>, meaning log all statements longer than
+         <varname>log_min_duration_statement</varname>.  Setting this to zero
+         disables logging, same as setting
+         <varname>log_min_duration_statement</varname> to
+         <literal>-1</literal>.  Using <varname>log_sample_rate</varname> is
+         helpful when the traffic is too high to log all queries.
+        </para>
+       </listitem>
+      </varlistentry>
+
      </variablelist>
 
     <para>
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f4133953be..5590f9a9d4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2115,7 +2115,8 @@ check_log_statement(List *stmt_list)
 
 /*
  * check_log_duration
- *		Determine whether current command's duration should be logged
+ *		Determine whether current command's duration should be logged.
+ *		If log_sample_rate < 1.0, log only a sample.
  *
  * Returns:
  *		0 if no logging is needed
@@ -2137,6 +2138,7 @@ check_log_duration(char *msec_str, bool was_logged)
 		int			usecs;
 		int			msecs;
 		bool		exceeded;
+		bool		in_sample;
 
 		TimestampDifference(GetCurrentStatementStartTimestamp(),
 							GetCurrentTimestamp(),
@@ -2153,7 +2155,15 @@ check_log_duration(char *msec_str, bool was_logged)
 					 (secs > log_min_duration_statement / 1000 ||
 					  secs * 1000 + msecs >= log_min_duration_statement)));
 
-		if (exceeded || log_duration)
+		/*
+		 * Do not log if log_sample_rate = 0.
+		 * Log a sample if log_sample_rate <= 1 and avoid unecessary random()
+		 * call if log_sample_rate = 1.
+		 */
+		in_sample = log_sample_rate != 0 &&
+			(log_sample_rate == 1 || random() <= log_sample_rate * MAX_RANDOM_VALUE);
+
+		if (exceeded && in_sample || log_duration)
 		{
 			snprintf(msec_str, 32, "%ld.%03d",
 					 secs * 1000 + msecs, usecs % 1000);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 859ef931e7..e89944789e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -455,6 +455,7 @@ int			log_min_messages = WARNING;
 int			client_min_messages = NOTICE;
 int			log_min_duration_statement = -1;
 int			log_temp_files = -1;
+double 		log_sample_rate = 1.0;
 int			trace_recovery_messages = LOG;
 
 int			temp_file_limit = -1;
@@ -3257,6 +3258,17 @@ static struct config_real ConfigureNamesReal[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"log_sample_rate", PGC_SUSET, LOGGING_WHEN,
+			gettext_noop("Fraction of statements to log."),
+			gettext_noop("1.0 logs all statements."),
+			NULL
+		},
+		&log_sample_rate,
+		1.0, 0.0, 1.0,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9e39baf466..3f9a016003 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -447,6 +447,8 @@
 					# statements running at least this number
 					# of milliseconds
 
+#log_sample_rate = 1  # Fraction of logged statements. 1 means log all
+					# statements.
 
 # - What to Log -
 
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index f462eabe59..f50a0d1f1b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -252,6 +252,7 @@ extern PGDLLIMPORT int log_min_messages;
 extern PGDLLIMPORT int client_min_messages;
 extern int	log_min_duration_statement;
 extern int	log_temp_files;
+extern double	log_sample_rate;
 
 extern int	temp_file_limit;
 

Reply via email to