The attached patch adds GUCs to control the use of the abbreviated keys
optimization when sorting. Also, I changed the TRUST_STRXFRM from a
#define into a GUC.

One reason for these GUCs is to make it easier to diagnose any issues
that come up with my collation work. Another is that I observed cases
with ICU where the abbreviated keys optimization resulted in a ~8-10%
regression, and it would be good to have some basic control over it.

I made them developer options because they are more about diagnosing
and I don't expect users to change these in production. If the issues
with abbreviated keys get more complex (where maybe we need to consider
costing each provider?), we can make it more user-facing.

This is fairly simple, so I plan to commit soon.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 7699f28634f04772c718ac465bbbff48b849f2bc Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Sat, 21 Jan 2023 12:44:07 -0800
Subject: [PATCH v1] Introduce GUCs to control abbreviated keys sort
 optimization.

The setting sort_abbreviated_keys turns the optimization on or off
overall. The optimization relies on collation providers, which are
complex dependencies, and the performance of the optimization may rely
on many factors. Introducing a GUC allows easier diagnosis when this
optimization results in worse perforamnce.

The setting trust_strxfrm replaces the define TRUST_STRXFRM, allowing
users to experiment with the abbreviated keys optimization when using
the libc provider. Previously, the optimization only applied to
collations using the ICU provider unless specially compiled. By
default, allowed only for superusers (because an incorrect setting
could lead to wrong results), but can be granted to others.
---
 doc/src/sgml/config.sgml                   | 40 ++++++++++++++++++++++
 src/backend/utils/adt/varlena.c            | 10 +++---
 src/backend/utils/misc/guc_tables.c        | 24 +++++++++++++
 src/backend/utils/sort/tuplesortvariants.c | 17 ++++++---
 4 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index dc9b78b0b7..47ee3ec6e7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11248,6 +11248,46 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-sort-abbreviated-keys" xreflabel="sort_abbreviated_keys">
+      <term><varname>sort_abbreviated_keys</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>sort_abbreviated_keys</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables or disables the use of abbreviated sort keys, an optimization,
+        if applicable. The default is <literal>true</literal>. Disabling may
+        be useful to diagnose problems or measure performance.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-trust-strxfrm" xreflabel="trust_strxfrm">
+      <term><varname>trust_strxfrm</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>trust_strxfrm</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Abbreviated keys, a sort optimization, depends on correct behavior of
+        the operating system function <function>strxfrm()</function> when
+        using a collation with the <literal>libc</literal> provider. On some
+        platforms <function>strxfrm()</function> does not return results
+        consistent with <function>strcoll()</function>, which means the
+        optimization could return wrong results. Set to
+        <literal>true</literal> if certain that <function>strxfrm()</function>
+        can be trusted.
+       </para>
+       <para>
+        The default value is <literal>false</literal>. This setting has no
+        effect if <xref linkend="guc-sort-abbreviated-keys"/> is set to
+        <literal>false</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-trace-locks" xreflabel="trace_locks">
       <term><varname>trace_locks</varname> (<type>boolean</type>)
       <indexterm>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 33ffdb013a..8b5b467205 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -45,6 +45,9 @@
 /* GUC variable */
 int			bytea_output = BYTEA_OUTPUT_HEX;
 
+/* GUC to enable use of strxfrm() for abbreviated keys */
+bool		trust_strxfrm = false;
+
 typedef struct varlena unknown;
 typedef struct varlena VarString;
 
@@ -2092,7 +2095,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * other libc other than Cygwin has so far been shown to have a problem,
 	 * we take the conservative course of action for right now and disable
 	 * this categorically.  (Users who are certain this isn't a problem on
-	 * their system can define TRUST_STRXFRM.)
+	 * their system can set the trust_strxfrm GUC to true.)
 	 *
 	 * Even apart from the risk of broken locales, it's possible that there
 	 * are platforms where the use of abbreviated keys should be disabled at
@@ -2105,10 +2108,9 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * categorically, we may still want or need to disable it for particular
 	 * platforms.
 	 */
-#ifndef TRUST_STRXFRM
-	if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU))
+	if (!trust_strxfrm && !collate_c &&
+		!(locale && locale->provider == COLLPROVIDER_ICU))
 		abbreviate = false;
-#endif
 
 	/*
 	 * If we're using abbreviated keys, or if we're using a locale-aware
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4ac808ed22..fd4a02fbf5 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -102,6 +102,8 @@ extern bool trace_syncscan;
 #ifdef DEBUG_BOUNDED_SORT
 extern bool optimize_bounded_sort;
 #endif
+extern bool sort_abbreviated_keys;
+extern bool trust_strxfrm;
 
 /*
  * Options for enum values defined in this module.
@@ -1673,6 +1675,28 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+	{
+		{"sort_abbreviated_keys", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Enables the use of abbreviated sort keys."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_EXPLAIN
+		},
+		&sort_abbreviated_keys,
+		true,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"trust_strxfrm", PGC_SUSET, DEVELOPER_OPTIONS,
+		 gettext_noop("Allow use of strxfrm() for abbreviated keys optimization for libc provider."),
+		 NULL,
+		 GUC_NOT_IN_SAMPLE
+		},
+		&trust_strxfrm,
+		false,
+		NULL, NULL, NULL
+	},
+
 #ifdef WAL_DEBUG
 	{
 		{"wal_debug", PGC_SUSET, DEVELOPER_OPTIONS,
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index eb6cfcfd00..ba16779f97 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -37,6 +37,8 @@
 #define DATUM_SORT		2
 #define CLUSTER_SORT	3
 
+bool sort_abbreviated_keys = true;
+
 static void removeabbrev_heap(Tuplesortstate *state, SortTuple *stups,
 							  int count);
 static void removeabbrev_cluster(Tuplesortstate *state, SortTuple *stups,
@@ -185,7 +187,8 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 		sortKey->ssup_nulls_first = nullsFirstFlags[i];
 		sortKey->ssup_attno = attNums[i];
 		/* Convey if abbreviation optimization is applicable in principle */
-		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+		if (sort_abbreviated_keys)
+			sortKey->abbreviate = (i == 0 && base->haveDatum1);
 
 		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
 	}
@@ -295,7 +298,8 @@ tuplesort_begin_cluster(TupleDesc tupDesc,
 			(scanKey->sk_flags & SK_BT_NULLS_FIRST) != 0;
 		sortKey->ssup_attno = scanKey->sk_attno;
 		/* Convey if abbreviation optimization is applicable in principle */
-		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+		if (sort_abbreviated_keys)
+			sortKey->abbreviate = (i == 0 && base->haveDatum1);
 
 		Assert(sortKey->ssup_attno != 0);
 
@@ -379,7 +383,8 @@ tuplesort_begin_index_btree(Relation heapRel,
 			(scanKey->sk_flags & SK_BT_NULLS_FIRST) != 0;
 		sortKey->ssup_attno = scanKey->sk_attno;
 		/* Convey if abbreviation optimization is applicable in principle */
-		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+		if (sort_abbreviated_keys)
+			sortKey->abbreviate = (i == 0 && base->haveDatum1);
 
 		Assert(sortKey->ssup_attno != 0);
 
@@ -499,7 +504,8 @@ tuplesort_begin_index_gist(Relation heapRel,
 		sortKey->ssup_nulls_first = false;
 		sortKey->ssup_attno = i + 1;
 		/* Convey if abbreviation optimization is applicable in principle */
-		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+		if (sort_abbreviated_keys)
+			sortKey->abbreviate = (i == 0 && base->haveDatum1);
 
 		Assert(sortKey->ssup_attno != 0);
 
@@ -573,7 +579,8 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
 	 * can't, because a datum sort only stores a single copy of the datum; the
 	 * "tuple" field of each SortTuple is NULL.
 	 */
-	base->sortKeys->abbreviate = !typbyval;
+	if (sort_abbreviated_keys)
+		base->sortKeys->abbreviate = !typbyval;
 
 	PrepareSortSupportFromOrderingOp(sortOperator, base->sortKeys);
 
-- 
2.34.1

Reply via email to