On Wed, Mar 26, 2014 at 8:08 PM, Peter Geoghegan <p...@heroku.com> wrote:
> The API I envisage is a new support function 3 that operator class
> authors may optionally provide.

I've built a prototype patch, attached, that extends SortSupport and
tuplesort to support "poor man's normalized keys". All the regression
tests pass, so while it's just a proof of concept, it is reasonably
well put together for one. The primary shortcoming of the prototype
(the main reason why I'm calling it a prototype rather than just a
patch) is that it isn't sufficiently generalized (i.e. it only works
for the cases currently covered by SortSupport - not B-Tree index
builds, or B-Tree scanKeys). There is no B-Tree support function
number 3 in the patch. I didn't spend too long on this.

I'm pretty happy with the results for in-memory sorting of text (my
development system uses 'en_US.UTF8', so please assume that any costs
involved are for runs that use that collation). With the dellstore2
sample database [1] restored to my local development instance, the
following example demonstrates just how much the technique can help
performance.

With master:

pg@hamster:~/sort-tests$ cat sort.sql
select * from (select * from customers order by firstname offset 100000) d;
pg@hamster:~/sort-tests$ pgbench -f sort.sql -n -T 100
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 100 s
number of transactions actually processed: 819
latency average: 122.100 ms
tps = 8.186197 (including connections establishing)
tps = 8.186522 (excluding connections establishing)

With patch applied (requires initdb for new text SortSupport pg_proc entry):

pg@hamster:~/sort-tests$ cat sort.sql
select * from (select * from customers order by firstname offset 100000) d;
pg@hamster:~/sort-tests$ pgbench -f sort.sql -n -T 100
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 100 s
number of transactions actually processed: 2525
latency average: 39.604 ms
tps = 25.241723 (including connections establishing)
tps = 25.242447 (excluding connections establishing)

It looks like this technique is very valuable indeed, at least in the
average or best case. We're not just benefiting from following the
advice of the standard, and using strxfrm() for sorting to amortize
the cost of the strxfrm() transformation that strcoll() must do
anyway. It stands to reason that there is also a lot of benefit from
sorting tightly-packed keys. Quicksort is cache oblivious, and having
it sort tightly-packed binary data, as opposed to going through all of
that deferencing and deserialization indirection is probably also very
helpful. A tool like Cachegrind might offer some additional insights,
but I haven't gone to the trouble of trying that out.

(By the way, my earlier recollection about how memory-frugal
MinimalTuple/memtuple building is within tuplesort was incorrect, so
there are no savings in memory to be had here).

As I mentioned, something like a SortSupport for numeric, with poor
man's normalized keys might also be compelling. I suggest we focus on
how this technique can be further generalized, though. This prototype
patch is derivative of Robert's abandoned SortSupport for text patch.
If he wanted to take this off my hands, I'd have no objections - I
don't think I'm going to have time to take this as far as I'd like.

[1] http://pgfoundry.org/forum/forum.php?forum_id=603
-- 
Peter Geoghegan
*** a/src/backend/utils/adt/varlena.c
--- b/src/backend/utils/adt/varlena.c
***************
*** 17,22 ****
--- 17,23 ----
  #include <ctype.h>
  #include <limits.h>
  
+ #include "access/nbtree.h"
  #include "access/tuptoaster.h"
  #include "catalog/pg_collation.h"
  #include "catalog/pg_type.h"
***************
*** 29,34 ****
--- 30,36 ----
  #include "utils/bytea.h"
  #include "utils/lsyscache.h"
  #include "utils/pg_locale.h"
+ #include "utils/sortsupport.h"
  
  
  /* GUC variable */
*************** typedef struct
*** 50,61 ****
--- 52,85 ----
  	int			skiptable[256]; /* skip distance for given mismatched char */
  } TextPositionState;
  
+ typedef struct
+ {
+ 	char	   *buf1;			/* Also used as strxfrm() scratch-space */
+ 	char	   *buf2;			/* Unused by leading key/poor man case */
+ 	int			buflen1;
+ 	int			buflen2;
+ #ifdef HAVE_LOCALE_T
+ 	pg_locale_t locale;
+ #endif
+ } TextSortSupport;
+ 
+ /*
+  * This should be large enough that most strings will be fit, but small enough
+  * that we feel comfortable putting it on the stack
+  */
+ #define TEXTBUFLEN		1024
+ 
  #define DatumGetUnknownP(X)			((unknown *) PG_DETOAST_DATUM(X))
  #define DatumGetUnknownPCopy(X)		((unknown *) PG_DETOAST_DATUM_COPY(X))
  #define PG_GETARG_UNKNOWN_P(n)		DatumGetUnknownP(PG_GETARG_DATUM(n))
  #define PG_GETARG_UNKNOWN_P_COPY(n) DatumGetUnknownPCopy(PG_GETARG_DATUM(n))
  #define PG_RETURN_UNKNOWN_P(x)		PG_RETURN_POINTER(x)
  
+ static void btpoorman_worker(SortSupport ssup, Oid collid);
+ static int bttextfastcmp_c(Datum x, Datum y, SortSupport ssup);
+ static int bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup);
+ static int bttextfastcmp_locale_poorman(Datum x, Datum y, SortSupport ssup);
+ static Datum bttext_convert(Datum d, SortSupport ssup);
  static int32 text_length(Datum str);
  static text *text_catenate(text *t1, text *t2);
  static text *text_substring(Datum str,
*************** varstr_cmp(char *arg1, int len1, char *a
*** 1356,1365 ****
  	}
  	else
  	{
! #define STACKBUFLEN		1024
! 
! 		char		a1buf[STACKBUFLEN];
! 		char		a2buf[STACKBUFLEN];
  		char	   *a1p,
  				   *a2p;
  
--- 1380,1387 ----
  	}
  	else
  	{
! 		char		a1buf[TEXTBUFLEN];
! 		char		a2buf[TEXTBUFLEN];
  		char	   *a1p,
  				   *a2p;
  
*************** varstr_cmp(char *arg1, int len1, char *a
*** 1393,1416 ****
  			int			a2len;
  			int			r;
  
! 			if (len1 >= STACKBUFLEN / 2)
  			{
  				a1len = len1 * 2 + 2;
  				a1p = palloc(a1len);
  			}
  			else
  			{
! 				a1len = STACKBUFLEN;
  				a1p = a1buf;
  			}
! 			if (len2 >= STACKBUFLEN / 2)
  			{
  				a2len = len2 * 2 + 2;
  				a2p = palloc(a2len);
  			}
  			else
  			{
! 				a2len = STACKBUFLEN;
  				a2p = a2buf;
  			}
  
--- 1415,1438 ----
  			int			a2len;
  			int			r;
  
! 			if (len1 >= TEXTBUFLEN / 2)
  			{
  				a1len = len1 * 2 + 2;
  				a1p = palloc(a1len);
  			}
  			else
  			{
! 				a1len = TEXTBUFLEN;
  				a1p = a1buf;
  			}
! 			if (len2 >= TEXTBUFLEN / 2)
  			{
  				a2len = len2 * 2 + 2;
  				a2p = palloc(a2len);
  			}
  			else
  			{
! 				a2len = TEXTBUFLEN;
  				a2p = a2buf;
  			}
  
*************** varstr_cmp(char *arg1, int len1, char *a
*** 1475,1485 ****
  		}
  #endif   /* WIN32 */
  
! 		if (len1 >= STACKBUFLEN)
  			a1p = (char *) palloc(len1 + 1);
  		else
  			a1p = a1buf;
! 		if (len2 >= STACKBUFLEN)
  			a2p = (char *) palloc(len2 + 1);
  		else
  			a2p = a2buf;
--- 1497,1507 ----
  		}
  #endif   /* WIN32 */
  
! 		if (len1 >= TEXTBUFLEN)
  			a1p = (char *) palloc(len1 + 1);
  		else
  			a1p = a1buf;
! 		if (len2 >= TEXTBUFLEN)
  			a2p = (char *) palloc(len2 + 1);
  		else
  			a2p = a2buf;
*************** bttextcmp(PG_FUNCTION_ARGS)
*** 1683,1688 ****
--- 1705,1990 ----
  	PG_RETURN_INT32(result);
  }
  
+ Datum
+ bttextsortsupport(PG_FUNCTION_ARGS)
+ {
+ 	SortSupport		ssup = (SortSupport) PG_GETARG_POINTER(0);
+ 	Oid				collid = ssup->ssup_collation;
+ 
+ 	btpoorman_worker(ssup, collid);
+ 
+ 	PG_RETURN_VOID();
+ }
+ 
+ static void
+ btpoorman_worker(SortSupport ssup, Oid collid)
+ {
+ 	TextSortSupport	   *tss;
+ 
+ 	/*
+ 	 * If LC_COLLATE = C, we can make things quite a bit faster by using
+ 	 * memcmp() rather than strcoll().  To minimize the per-comparison
+ 	 * overhead, we make this decision just once for the whole sort.  For now,
+ 	 * don't apply the "small string optimization" to C-locale, even though
+ 	 * that should work (without any strxfrm() processing).
+ 	 */
+ 	if (lc_collate_is_c(collid))
+ 	{
+ 		ssup->comparator = bttextfastcmp_c;
+ 		ssup->converter = NULL;
+ 		ssup->proper = NULL;
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * WIN32 requires complex hacks when the database encoding is UTF-8
+ 	 * (except when using the "C" collation).  For now, we don't optimize that
+ 	 * case.
+ 	 */
+ #ifdef WIN32
+ 	if (GetDatabaseEncoding() == PG_UTF8)
+ 		PG_RETURN_VOID();
+ #endif /* WIN32 */
+ 
+ 	/*
+ 	 * We need a collation-sensitive comparison.  To make things faster,
+ 	 * we'll figure out the collation based on the locale id and cache the
+ 	 * result.  Also, since strxfrm()/strcoll() require NULL-terminated inputs,
+ 	 * prepare one or two palloc'd buffers to use as temporary workspace.  In
+ 	 * the ad-hoc comparison case we only use palloc'd buffers when we need
+ 	 * more space than we're comfortable allocating on the stack, but here we
+ 	 * can keep the buffers around for the whole sort, so it makes sense to
+ 	 * allocate them once and use them unconditionally (although we won't
+ 	 * actually need them when sorting proper actually begins and strxfrm()
+ 	 * conversion has already occurred, when sorting a leading key).
+ 	 */
+ 	tss = MemoryContextAlloc(ssup->ssup_cxt, sizeof(TextSortSupport));
+ #ifdef HAVE_LOCALE_T
+ 	tss->locale = 0;
+ #endif
+ 
+ 	if (collid != DEFAULT_COLLATION_OID)
+ 	{
+ 		if (!OidIsValid(collid))
+ 		{
+ 			/*
+ 			 * This typically means that the parser could not resolve a
+ 			 * conflict of implicit collations, so report it that way.
+ 			 */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INDETERMINATE_COLLATION),
+ 					 errmsg("could not determine which collation to use for string comparison"),
+ 					 errhint("Use the COLLATE clause to set the collation explicitly.")));
+ 		}
+ #ifdef HAVE_LOCALE_T
+ 		tss->locale = pg_newlocale_from_collation(collid);
+ #endif
+ 	}
+ 
+ 	tss->buf1 = MemoryContextAlloc(ssup->ssup_cxt, TEXTBUFLEN);
+ 	tss->buflen1 = TEXTBUFLEN;
+ 
+ 	ssup->ssup_extra = tss;
+ 	if (!ssup->leading)
+ 	{
+ 		tss->buf2 = MemoryContextAlloc(ssup->ssup_cxt, TEXTBUFLEN);
+ 		tss->buflen2 = TEXTBUFLEN;
+ 		ssup->comparator = bttextfastcmp_locale;
+ 		ssup->converter = NULL;
+ 		ssup->proper = NULL;
+ 		return;
+ 	}
+ 
+ 	ssup->comparator = bttextfastcmp_locale_poorman;
+ 	ssup->converter = bttext_convert;
+ 
+ 	ssup->proper = MemoryContextAlloc(CurrentMemoryContext, sizeof(SortSupportData));
+ 	ssup->proper->ssup_cxt = ssup->ssup_cxt;
+ 	ssup->proper->ssup_collation = ssup->ssup_collation;
+ 	ssup->proper->ssup_reverse = ssup->ssup_reverse;
+ 	ssup->proper->ssup_nulls_first = ssup->ssup_nulls_first;
+ 	ssup->proper->ssup_attno = ssup->ssup_attno;
+ 
+ 	/*
+ 	 * Initialize the "proper" sortsupport state with a reliable
+ 	 * strcoll()-based comparator for tie-breaking.  This is the same thing
+ 	 * that this routine does directly for non-leading keys, which never have
+ 	 * poor man's normalized keys.
+ 	 *
+ 	 * Conceptually, this is the "true" leading key, which is technically not
+ 	 * actually leading.  The pseudo leading poor man's key is the leading key
+ 	 * for tuplesort's purposes.
+ 	 */
+ 	ssup->proper->leading = false;
+ 	btpoorman_worker(ssup->proper, collid);
+ }
+ 
+ static int
+ bttextfastcmp_c(Datum x, Datum y, SortSupport ssup)
+ {
+ 	text	   *arg1 = DatumGetTextPP(x);
+ 	text	   *arg2 = DatumGetTextPP(y);
+ 	char	   *a1p,
+ 			   *a2p;
+ 	int			len1,
+ 				len2,
+ 	result;
+ 
+ 	a1p = VARDATA_ANY(arg1);
+ 	a2p = VARDATA_ANY(arg2);
+ 
+ 	len1 = VARSIZE_ANY_EXHDR(arg1);
+ 	len2 = VARSIZE_ANY_EXHDR(arg2);
+ 
+ 	result = memcmp(a1p, a2p, Min(len1, len2));
+ 	if ((result == 0) && (len1 != len2))
+ 		result = (len1 < len2) ? -1 : 1;
+ 
+ 	/* We can't afford to leak memory here. */
+ 	if (PointerGetDatum(arg1) != x)
+ 		pfree(arg1);
+ 	if (PointerGetDatum(arg2) != y)
+ 		pfree(arg2);
+ 
+ 	return result;
+ }
+ 
+ static int
+ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
+ {
+ 	text	   *arg1 = DatumGetTextPP(x);
+ 	text	   *arg2 = DatumGetTextPP(y);
+ 	TextSortSupport	   *tss = (TextSortSupport *) ssup->ssup_extra;
+ 	char	   *a1p,
+ 			   *a2p;
+ 	int			len1,
+ 				len2,
+ 	result;
+ 
+ 	a1p = VARDATA_ANY(arg1);
+ 	a2p = VARDATA_ANY(arg2);
+ 
+ 	len1 = VARSIZE_ANY_EXHDR(arg1);
+ 	len2 = VARSIZE_ANY_EXHDR(arg2);
+ 
+ 	if (len1 >= tss->buflen1)
+ 	{
+ 		pfree(tss->buf1);
+ 		tss->buflen1 *= 2;
+ 		tss->buf1 = MemoryContextAlloc(ssup->ssup_cxt, tss->buflen1);
+ 	}
+ 	if (len2 >= tss->buflen2)
+ 	{
+ 		pfree(tss->buf2);
+ 		tss->buflen2 *= 2;
+ 		tss->buf2 = MemoryContextAlloc(ssup->ssup_cxt, tss->buflen2);
+ 	}
+ 
+ 	memcpy(tss->buf1, a1p, len1);
+ 	tss->buf1[len1] = '\0';
+ 	memcpy(tss->buf2, a2p, len2);
+ 	tss->buf2[len2] = '\0';
+ 
+ #ifdef HAVE_LOCALE_T
+ 	if (tss->locale)
+ 		result = strcoll_l(tss->buf1, tss->buf2, tss->locale);
+ 	else
+ #endif
+ 		result = strcoll(tss->buf1, tss->buf2);
+ 
+ 	/*
+ 	 * In some locales strcoll() can claim that nonidentical strings are equal.
+ 	 * Believing that would be bad news for a number of reasons, so we follow
+ 	 * Perl's lead and sort "equal" strings according to strcmp().
+ 	 */
+ 	if (result == 0)
+ 		result = strcmp(tss->buf1, tss->buf2);
+ 
+ 	/* We can't afford to leak memory here. */
+ 	if (PointerGetDatum(arg1) != x)
+ 		pfree(arg1);
+ 	if (PointerGetDatum(arg2) != y)
+ 		pfree(arg2);
+ 
+ 	return result;
+ }
+ 
+ /*
+  * This is only used for the leading column of a sort operation, where
+  * bttext_convert() has already prepared a poor man's normalized key
+  * representation.  The result returned here can only be trusted when non-zero.
+  */
+ static int
+ bttextfastcmp_locale_poorman(Datum x, Datum y, SortSupport ssup)
+ {
+ 	int result;
+ 	char* a = (char*) &x;
+ 	char* b = (char*) &y;
+ 	int len1, len2;
+ 
+ 	len1 = strlen(a);
+ 	len2 = strlen(b);
+ 
+ 	result = memcmp(a, b, Min(len1, len2));
+ 
+ 	/*
+ 	 * Length isn't usually reliable here, so when result = 0, leave it up to
+ 	 * the bttextfastcmp_locale() tie-breaker that the core system is obligated
+ 	 * to perform (rather than matching bttextfastcmp_c()'s length tie-break
+ 	 * behavior).
+ 	 *
+ 	 * Note that even a strcmp() on two non-truncated strxfrm() blobs cannot
+ 	 * indicate equality in a way that is generally trustworthy, for the same
+ 	 * reason that there is a strcoll() strcmp() tie-breaker (there'd still
+ 	 * need to be a strcmp() tie-breaker on the *original* string).
+ 	 */
+ 	return result;
+ }
+ 
+ static Datum
+ bttext_convert(Datum d, SortSupport ssup)
+ {
+ 	Datum				res;
+ 	char			   *pres;
+ 	int					len;
+ 	TextSortSupport	   *tss = (TextSortSupport *) ssup->ssup_extra;
+ 	text			   *full = DatumGetTextPP(d);
+ 
+ 	/*
+ 	 * Convert text into a "poor man's normalized key".  This is a
+ 	 * pass-by-value Datum that is treated as a cstring buffer.  A similar
+ 	 * technique is sometimes called the "small string optimization", used
+ 	 * where a string can be stored in place (typically, there is a
+ 	 * pointer/pass-by-value array union).
+ 	 *
+ 	 * Initialize res to all zeroes.
+ 	 */
+ 	res = 0;
+ 	pres = (char*) &res;
+ 	len = VARSIZE_ANY_EXHDR(full);
+ 
+ 	/* By convention, we use buffer 1 */
+ 	if (len >= tss->buflen1)
+ 	{
+ 		pfree(tss->buf1);
+ 		tss->buflen1 *= 2;
+ 		tss->buf1 = MemoryContextAlloc(ssup->ssup_cxt, tss->buflen1);
+ 	}
+ 
+ 	memcpy(tss->buf1, VARDATA_ANY(full), len);
+ 	tss->buf1[len] = '\0';
+ 
+ #ifdef HAVE_LOCALE_T
+ 	if (tss->locale)
+ 		strxfrm_l(pres, tss->buf1, Min(sizeof(Datum), len), tss->locale);
+ 	else
+ #endif
+ 		strxfrm(pres, tss->buf1, Min(sizeof(Datum), len));
+ 
+ 	pres[Min(sizeof(Datum) - 1, len)] = '\0';
+ 
+ 	return res;
+ }
  
  Datum
  text_larger(PG_FUNCTION_ARGS)
*** a/src/backend/utils/sort/sortsupport.c
--- b/src/backend/utils/sort/sortsupport.c
*************** PrepareSortSupportComparisonShim(Oid cmp
*** 82,87 ****
--- 82,89 ----
  
  	ssup->ssup_extra = extra;
  	ssup->comparator = comparison_shim;
+ 	ssup->converter = NULL;
+ 	ssup->proper = NULL;
  }
  
  /*
*************** PrepareSortSupportFromOrderingOp(Oid ord
*** 104,109 ****
--- 106,113 ----
  		elog(ERROR, "operator %u is not a valid ordering operator",
  			 orderingOp);
  
+ 	ssup->converter = NULL;
+ 
  	if (issupport)
  	{
  		/* The sort support function should provide a comparator */
*** a/src/backend/utils/sort/tuplesort.c
--- b/src/backend/utils/sort/tuplesort.c
*************** bool		optimize_bounded_sort = true;
*** 150,156 ****
   * When sorting single Datums, the data value is represented directly by
   * datum1/isnull1.	If the datatype is pass-by-reference and isnull1 is false,
   * then datum1 points to a separately palloc'd data value that is also pointed
!  * to by the "tuple" pointer; otherwise "tuple" is NULL.
   *
   * While building initial runs, tupindex holds the tuple's run number.  During
   * merge passes, we re-use it to hold the input tape number that each tuple in
--- 150,159 ----
   * When sorting single Datums, the data value is represented directly by
   * datum1/isnull1.	If the datatype is pass-by-reference and isnull1 is false,
   * then datum1 points to a separately palloc'd data value that is also pointed
!  * to by the "tuple" pointer; otherwise "tuple" is NULL.  Note that there are
!  * some exceptions, as when the sort support infrastructure provides a "poor
!  * man's normalized key" representation.   When that occurs, extra precautions
!  * are taken when a comparison involving a pair of datum1s returns 0.
   *
   * While building initial runs, tupindex holds the tuple's run number.  During
   * merge passes, we re-use it to hold the input tape number that each tuple in
*************** tuplesort_begin_heap(TupleDesc tupDesc,
*** 647,657 ****
  		sortKey->ssup_collation = sortCollations[i];
  		sortKey->ssup_nulls_first = nullsFirstFlags[i];
  		sortKey->ssup_attno = attNums[i];
  
  		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
  	}
! 
! 	if (nkeys == 1)
  		state->onlyKey = state->sortKeys;
  
  	MemoryContextSwitchTo(oldcontext);
--- 650,667 ----
  		sortKey->ssup_collation = sortCollations[i];
  		sortKey->ssup_nulls_first = nullsFirstFlags[i];
  		sortKey->ssup_attno = attNums[i];
+ 		sortKey->leading = (i == 0);
  
  		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
  	}
! 	/*
! 	 * The "onlyKey" optimization cannot be used when a tie-breaker for an
! 	 * unreliable poor man's normalized key comparison is required.  Typically,
! 	 * the optimization is only of significant value to pass-by-value types
! 	 * anyway, whereas poor man's normalized keys are typically used by
! 	 * pass-by-reference types.
! 	 */
! 	if (nkeys == 1 && !state->sortKeys->converter)
  		state->onlyKey = state->sortKeys;
  
  	MemoryContextSwitchTo(oldcontext);
*************** comparetup_heap(const SortTuple *a, cons
*** 2870,2875 ****
--- 2880,2907 ----
  	rtup.t_len = ((MinimalTuple) b->tuple)->t_len + MINIMAL_TUPLE_OFFSET;
  	rtup.t_data = (HeapTupleHeader) ((char *) b->tuple - MINIMAL_TUPLE_OFFSET);
  	tupDesc = state->tupDesc;
+ 	/* If leading comparison was unreliable, do reliable comparison */
+ 	if (state->sortKeys->converter)
+ 	{
+ 		AttrNumber	attno = sortKey->ssup_attno;
+ 		Datum		datum1,
+ 					datum2;
+ 		bool		isnull1,
+ 					isnull2;
+ 
+ 		Assert(attno == sortKey->proper->ssup_attno);
+ 		Assert(sortKey->leading);
+ 
+ 		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+ 		datum2 = heap_getattr(&rtup, attno, tupDesc, &isnull2);
+ 
+ 		compare = ApplySortComparator(datum1, isnull1,
+ 									  datum2, isnull2,
+ 									  sortKey->proper);
+ 		if (compare != 0)
+ 			return compare;
+ 	}
+ 
  	sortKey++;
  	for (nkey = 1; nkey < state->nKeys; nkey++, sortKey++)
  	{
*************** copytup_heap(Tuplesortstate *state, Sort
*** 2910,2919 ****
  	/* set up first-column key value */
  	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
  	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
! 	stup->datum1 = heap_getattr(&htup,
! 								state->sortKeys[0].ssup_attno,
! 								state->tupDesc,
! 								&stup->isnull1);
  }
  
  static void
--- 2942,2971 ----
  	/* set up first-column key value */
  	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
  	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
! 	if (!state->sortKeys->converter)
! 	{
! 		/* Store ordinary Datum representation */
! 		stup->datum1 = heap_getattr(&htup,
! 									state->sortKeys[0].ssup_attno,
! 									state->tupDesc,
! 									&stup->isnull1);
! 	}
! 	else
! 	{
! 		Datum		temp;
! 
! 		/*
! 		 * Store "poor man's normalized key", which cannot indicate equality in
! 		 * a trustworthy manner, and may require a tie-breaker
! 		 */
! 		Assert(state->sortKeys[0].leading);
! 		temp = heap_getattr(&htup,
! 									state->sortKeys[0].ssup_attno,
! 									state->tupDesc,
! 									&stup->isnull1);
! 
! 		stup->datum1 = state->sortKeys->converter(temp, state->sortKeys);
! 	}
  }
  
  static void
*** a/src/include/catalog/pg_amproc.h
--- b/src/include/catalog/pg_amproc.h
*************** DATA(insert (	1989   26 26 1 356 ));
*** 122,127 ****
--- 122,128 ----
  DATA(insert (	1989   26 26 2 3134 ));
  DATA(insert (	1991   30 30 1 404 ));
  DATA(insert (	1994   25 25 1 360 ));
+ DATA(insert (	1994   25 25 2 3492 ));
  DATA(insert (	1996   1083 1083 1 1107 ));
  DATA(insert (	2000   1266 1266 1 1358 ));
  DATA(insert (	2002   1562 1562 1 1672 ));
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 3135 ( btnamesortsuppo
*** 610,615 ****
--- 610,617 ----
  DESCR("sort support");
  DATA(insert OID = 360 (  bttextcmp		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "25 25" _null_ _null_ _null_ _null_ bttextcmp _null_ _null_ _null_ ));
  DESCR("less-equal-greater");
+ DATA(insert OID = 3492 ( bttextsortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2278 "2281" _null_ _null_ _null_ _null_ bttextsortsupport _null_ _null_ _null_ ));
+ DESCR("sort support");
  DATA(insert OID = 377 (  cash_cmp		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "790 790" _null_ _null_ _null_ _null_ cash_cmp _null_ _null_ _null_ ));
  DESCR("less-equal-greater");
  DATA(insert OID = 380 (  btreltimecmp	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "703 703" _null_ _null_ _null_ _null_ btreltimecmp _null_ _null_ _null_ ));
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum bttintervalcmp(PG_FUNCTION_
*** 312,317 ****
--- 312,318 ----
  extern Datum btcharcmp(PG_FUNCTION_ARGS);
  extern Datum btnamecmp(PG_FUNCTION_ARGS);
  extern Datum bttextcmp(PG_FUNCTION_ARGS);
+ extern Datum bttextsortsupport(PG_FUNCTION_ARGS);
  
  /*
   *		Per-opclass sort support functions for new btrees.	Like the
*** a/src/include/utils/sortsupport.h
--- b/src/include/utils/sortsupport.h
*************** typedef struct SortSupportData
*** 59,64 ****
--- 59,65 ----
  	 */
  	MemoryContext ssup_cxt;		/* Context containing sort info */
  	Oid			ssup_collation; /* Collation to use, or InvalidOid */
+ 	bool		leading;		/* Leading (poor man-applicable) key? */
  
  	/*
  	 * Additional sorting parameters; but unlike ssup_collation, these can be
*************** typedef struct SortSupportData
*** 69,75 ****
  	bool		ssup_nulls_first;		/* sort nulls first? */
  
  	/*
! 	 * These fields are workspace for callers, and should not be touched by
  	 * opclass-specific functions.
  	 */
  	AttrNumber	ssup_attno;		/* column number to sort */
--- 70,76 ----
  	bool		ssup_nulls_first;		/* sort nulls first? */
  
  	/*
! 	 * These fields are workspace for callers, and should only be touched by
  	 * opclass-specific functions.
  	 */
  	AttrNumber	ssup_attno;		/* column number to sort */
*************** typedef struct SortSupportData
*** 81,86 ****
--- 82,97 ----
  	void	   *ssup_extra;		/* Workspace for opclass functions */
  
  	/*
+ 	 * Alternative "proper" SortSupport state for leading key, used only when
+ 	 * an unreliable poor man's normalized key comparator returns 0, and
+ 	 * tuplesort must use this separate state to perform a trustworthy
+ 	 * comparison.  This relates to the same attribute as our ssup_attno.  The
+ 	 * core system expects to be able to check if this is set, and pass it back
+ 	 * after having this SortSupport state return an uncertain 0 result.
+ 	 */
+ 	struct SortSupportData *proper;
+ 
+ 	/*
  	 * Function pointers are zeroed before calling the BTSORTSUPPORT function,
  	 * and must be set by it for any acceleration methods it wants to supply.
  	 * The comparator pointer must be set, others are optional.
*************** typedef struct SortSupportData
*** 96,103 ****
  	int			(*comparator) (Datum x, Datum y, SortSupport ssup);
  
  	/*
! 	 * Additional sort-acceleration functions might be added here later.
  	 */
  } SortSupportData;
  
  
--- 107,120 ----
  	int			(*comparator) (Datum x, Datum y, SortSupport ssup);
  
  	/*
! 	 * Given a datum involving an opclass with relevant support, use this
! 	 * callback to convert to a pass-by-value untrustworthy Datum, or poor
! 	 * man's normalized key.  Our sort support comparator expects this.  When
! 	 * this is set, tuplesort knows not to trust a comparison results of 0.
! 	 * However, -1 or 1 can be trusted, and taken as a perfect proxy for what
! 	 * the full leading key would indicate.
  	 */
+ 	Datum		(*converter) (Datum d, SortSupport ssup);
  } SortSupportData;
  
  
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to