Re: [HACKERS] Inlining comparators as a performance optimisation

Peter Geoghegan Thu, 01 Dec 2011 08:45:47 -0800

Attached is revision of my patch with some clean-ups. In particular,
I'm now using switch statements for greater readability, plus
supporting fast path sorting of the time datatype. I've also updated
the documentation on "Date/Time Types" to note the additional
disadvantage of using the deprecated "store timestamp + friends as
double precision floating-point numbers" compile time option.


There is one aspect to this optimisation that I haven't touched on,
which is the effect on memory consumption. I think that much of the
value that this patch will deliver will come from being able to
release sort memory earlier. Consider that the substantial
improvements in raw sorting speed (far more substantial than the
improvements in query runtime) will sometimes result in a concomitant
reduction in the time that the executor holds onto memory allocated
for sorting. Maybe the effect will only be really noticeable for plans
with a sort node as their root node, but that isn't exactly a rare
occurrence, particularly among large, expensive sorts.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

From cffad91578edd698e89ef2bf7384d82aee26b57e Mon Sep 17 00:00:00 2001
From: peter2ndQuadrant <pe...@2ndquadrant.com>
Date: Sun, 20 Nov 2011 20:36:25 +0000
Subject: [PATCH] Initial commit of optimization

Stop directly using oid

Added int8 quicksort fast path specialisation, which can also be used in
place of F_TIMESTAMP_CMP for HAVE_INT64_TIMESTAMP builds.

Rebased, revised patch for -hackers, with timestamp and int8 fast path
sorting using the same int8 specialization.

Remove unneeded line

Rebase for -hackers

Descriminate against cases where nKeys > 1 when selecting sort
specialisation.

We now support fast path sorting with multiple scanKeys.

We now inline for one scanKey, but do not inline our comparetup_heap
replacement when multiple scanKeys are used (although this is at the
compiler's discretion). However, when multiple scanKeys are used, we
still use a specialisation that will elide the SQL-function-call
machinery for the first scanKey, which is almost as good for most cases.

Further formatting clean-ups

Move specialization selection logic to tuplesort_begin_heap(), per
suggestion of Robert Haas

Add simple switch for testing optimization.

Improvements to template qsort_arg comments. Support fast path sorting
of dates.

Added float fast path sorting

Patch mimimization.

Additional clean-up.

Indentation issue.

Specialization clean-ups

Use switch statement for sort specialization selection

Further comment clean-up.

Additional clean-up

Make comparators consistent

Added date fast path support, documentation updates

Fix template_qsort_arg.h assertion

Assert nScanKeys == 1 for inlining case
---
 doc/src/sgml/datatype.sgml             |    6 +
 src/backend/utils/sort/tuplesort.c     |  223 ++++++++++++++++++++++++-
 src/include/utils/template_qsort_arg.h |  288 ++++++++++++++++++++++++++++++++
 src/port/qsort.c                       |    3 +-
 4 files changed, 512 insertions(+), 8 deletions(-)
 create mode 100644 src/include/utils/template_qsort_arg.h

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index fe59a1c..f5dfbbb 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1619,6 +1619,12 @@ SELECT E'\\xDEADBEEF';
     floating-point case, large <type>interval</type> values degrade in
     precision as the size of the interval increases.
    </para>
+   <para>
+    Note that if the server is built to represent these types as
+    floating point numbers, that will prevent it from using fast path
+    sort specializations for the types, and sorting columns of those
+    types will be noticeably slower.
+   </para>
    </note>
 
    <para>
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3505236..0a14a90 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -107,11 +107,13 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/datum.h"
+#include "utils/fmgroids.h"
 #include "utils/logtape.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_rusage.h"
 #include "utils/rel.h"
+#include "utils/template_qsort_arg.h"
 #include "utils/tuplesort.h"
 
 
@@ -226,6 +228,13 @@ struct Tuplesortstate
 										   Tuplesortstate *state);
 
 	/*
+	 * This specialization function pointer is sometimes used as an alternative
+	 * to the standard qsort_arg, when it has been determined that we can 
+	 * benefit from various per-type performance optimizations.
+	 */
+	void (*qsort_arg_spec)(void *a, size_t n, size_t es, void *arg);
+
+	/*
 	 * Function to copy a supplied input tuple into palloc'd space and set up
 	 * its SortTuple representation (ie, set tuple/datum1/isnull1).  Also,
 	 * state->availMem must be decreased by the amount of space used for the
@@ -457,6 +466,11 @@ static void tuplesort_heap_insert(Tuplesortstate *state, SortTuple *tuple,
 static void tuplesort_heap_siftup(Tuplesortstate *state, bool checkIndex);
 static unsigned int getlen(Tuplesortstate *state, int tapenum, bool eofOK);
 static void markrunend(Tuplesortstate *state, int tapenum);
+static inline int32 inlineApplySortFunction(FmgrInfo *sortFunction, 
+						int sk_flags, Oid collation,
+						Datum datum1, bool isNull1,
+						Datum datum2, bool isNull2);
+
 static int comparetup_heap(const SortTuple *a, const SortTuple *b,
 				Tuplesortstate *state);
 static void copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup);
@@ -576,6 +590,99 @@ tuplesort_begin_common(int workMem, bool randomAccess)
 	return state;
 }
 
+/*
+ * Manufacture type specific sorting specialisations
+ * with inline comparators
+ */
+static inline int
+compare_int4(Datum first, Datum second)
+{
+	int32           a = DatumGetInt32(first);
+	int32           b = DatumGetInt32(second);
+	if (a > b)
+		return 1;
+	else if (a == b)
+		return 0;
+	else
+		return -1;
+}
+
+static inline int
+compare_int8(Datum first, Datum second)
+{
+	int64           a = DatumGetInt64(first);
+	int64           b = DatumGetInt64(second);
+
+	if (a > b)
+		return 1;
+	else if (a == b)
+		return 0;
+	else
+		return -1;
+}
+
+static inline int
+compare_float4(Datum first, Datum second)
+{
+	float4		a = DatumGetFloat4(first);
+	float4		b = DatumGetFloat4(second);
+
+	if (isnan(a))
+	{
+		if (isnan(b))
+			return 0;			/* NAN = NAN */
+		else
+			return 1;			/* NAN > non-NAN */
+	}
+	else if (isnan(b))
+	{
+		return -1;				/* non-NAN < NAN */
+	}
+	else
+	{
+		if (a > b)
+			return 1;
+		else if (a < b)
+			return -1;
+		else
+			return 0;
+	}
+}
+
+static inline int
+compare_float8(Datum first, Datum second)
+{
+	float8		a = DatumGetFloat8(first);
+	float8		b = DatumGetFloat8(second);
+
+	if (isnan(a))
+	{
+		if (isnan(b))
+			return 0;			/* NAN = NAN */
+		else
+			return 1;			/* NAN > non-NAN */
+	}
+	else if (isnan(b))
+	{
+		return -1;				/* non-NAN < NAN */
+	}
+	else
+	{
+		if (a > b)
+			return 1;
+		else if (a < b)
+			return -1;
+		else
+			return 0;
+	}
+}
+
+/* Instantiate sorting specializations */
+TEMPLATE_QSORT_ARG(int4, compare_int4);
+TEMPLATE_QSORT_ARG(int8, compare_int8);
+TEMPLATE_QSORT_ARG(float4, compare_float4);
+TEMPLATE_QSORT_ARG(float8, compare_float8);
+
 Tuplesortstate *
 tuplesort_begin_heap(TupleDesc tupDesc,
 					 int nkeys, AttrNumber *attNums,
@@ -650,6 +757,83 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 							   (Datum) 0);
 	}
 
+	/* Select a qsort specialization, if possible */
+	if (state->scanKeys)
+	{
+		if (state->nKeys == 1)
+		{
+			/* Single scankey, use an inlining specialization */
+			switch (state->scanKeys->sk_func.fn_oid)
+			{
+				/* Some comparison that has an underlying int4 representation */
+				case F_BTINT4CMP:
+					state->qsort_arg_spec = int4inlqsort_arg;
+					break;
+				case F_DATE_CMP:
+					state->qsort_arg_spec = int4inlqsort_arg;
+					break;
+				/* Some comparison that has an underlying int8 representation */
+				case F_BTINT8CMP:
+					state->qsort_arg_spec = int8inlqsort_arg;
+					break;
+#ifdef HAVE_INT64_TIMESTAMP
+				case F_TIMESTAMP_CMP:
+					state->qsort_arg_spec = int8inlqsort_arg;
+					break;
+				case F_TIME_CMP:
+					state->qsort_arg_spec = int8inlqsort_arg;
+					break;
+#endif
+				/* floating point types */
+				case F_BTFLOAT4CMP:
+					state->qsort_arg_spec = float4inlqsort_arg;
+					break;
+				case F_BTFLOAT8CMP:
+					state->qsort_arg_spec = float8inlqsort_arg;
+					break;
+				default:
+					state->qsort_arg_spec = NULL;
+			}
+		}
+		else
+		{
+			/*
+			 * Multiple scankeys, use a specialisation to elide
+			 * SQL function call machinery for the first
+			 */
+			switch (state->scanKeys->sk_func.fn_oid)
+			{
+				/* Some comparison that has an underlying int4 representation */
+				case F_BTINT4CMP:
+					state->qsort_arg_spec = int4regqsort_arg;
+					break;
+				case F_DATE_CMP:
+					state->qsort_arg_spec = int4regqsort_arg;
+					break;
+				/* Some comparison that has an underlying int8 representation */
+				case F_BTINT8CMP:
+					state->qsort_arg_spec = int8regqsort_arg;
+					break;
+#ifdef HAVE_INT64_TIMESTAMP
+				case F_TIMESTAMP_CMP:
+					state->qsort_arg_spec = int8regqsort_arg;
+					break;
+				case F_TIME_CMP:
+					state->qsort_arg_spec = int8regqsort_arg;
+					break;
+#endif
+				/* floating point types */
+				case F_BTFLOAT4CMP:
+					state->qsort_arg_spec = float4regqsort_arg;
+					break;
+				case F_BTFLOAT8CMP:
+					state->qsort_arg_spec = float8regqsort_arg;
+					break;
+				default:
+					state->qsort_arg_spec = NULL;
+			}
+		}
+	}
 	MemoryContextSwitchTo(oldcontext);
 
 	return state;
@@ -1225,7 +1409,7 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple)
 }
 
 /*
- * All tuples have been provided; finish the sort.
+ * All tuples have been provided; perform the sort.
  */
 void
 tuplesort_performsort(Tuplesortstate *state)
@@ -1247,11 +1431,28 @@ tuplesort_performsort(Tuplesortstate *state)
 			 * amount of memory.  Just qsort 'em and we're done.
 			 */
 			if (state->memtupcount > 1)
-				qsort_arg((void *) state->memtuples,
-						  state->memtupcount,
-						  sizeof(SortTuple),
-						  (qsort_arg_comparator) state->comparetup,
-						  (void *) state);
+			{
+				/* Consider a sorting specialization */
+				if (state->qsort_arg_spec)
+				{
+					state->qsort_arg_spec((void *) state->memtuples,
+								  state->memtupcount,
+								  sizeof(SortTuple),
+								  (void *) state);
+				}
+				else
+				{
+					/*
+					 * Fall back on type-neutral qsort with
+					 * function-pointer comparator
+					 */
+					qsort_arg((void *) state->memtuples,
+								  state->memtupcount,
+								  sizeof(SortTuple),
+								  (qsort_arg_comparator) state->comparetup,
+								  (void *) state);
+				}
+			}
 			state->current = 0;
 			state->eof_reached = false;
 			state->markpos_offset = 0;
@@ -2657,6 +2858,9 @@ myFunctionCall2Coll(FmgrInfo *flinfo, Oid collation, Datum arg1, Datum arg2)
  * and return a 3-way comparison result.  This takes care of handling
  * reverse-sort and NULLs-ordering properly.  We assume that DESC and
  * NULLS_FIRST options are encoded in sk_flags the same way btree does it.
+ *
+ * Note that this logic is partially duplicated in template_qsort_arg.h,
+ * from which it is "instantiated" per-type.
  */
 static inline int32
 inlineApplySortFunction(FmgrInfo *sortFunction, int sk_flags, Oid collation,
@@ -2708,10 +2912,15 @@ ApplySortFunction(FmgrInfo *sortFunction, int sortFlags, Oid collation,
 }
 
 
+
+
 /*
  * Routines specialized for HeapTuple (actually MinimalTuple) case
+ *
+ * N.B. : Some of this code is partially duplicated for the purposes 
+ * of heap comparison specializations. Take care to keep the code
+ * within template_qsort_arg.h consistent with this code.
  */
-
 static int
 comparetup_heap(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
 {
diff --git a/src/include/utils/template_qsort_arg.h b/src/include/utils/template_qsort_arg.h
new file mode 100644
index 0000000..9fee892
--- /dev/null
+++ b/src/include/utils/template_qsort_arg.h
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *	template_qsort_arg.h: "template" version of qsort_arg.c
+ *
+ *	This version of qsort_arg is exclusively used within tuplesort.c to	more
+ *	efficiently sort common types such as integers and floats. In providing
+ *	this version, we seek to take advantage of compile-time	optimizations,
+ *	in particular, the inlining of comparators, as well as reducing the
+ *	overhead of comparisons that the general SQL-callable-function API imposes.
+ *
+ *	The TEMPLATE_QSORT_ARG() macro generates an inlining variant (for sorts
+ *	with a single scanKey) and non-inlining variant for sorts with multiple
+ *	scanKeys.
+ *
+ *	We rely on various function declarations, and indeed partially duplicate
+ *	some code from tuplesort.c, so this file should be considered private to
+ *	that module, rather than a generic piece of infrastructure.
+ *
+ *	CAUTION: if you change this file, see also qsort_arg.c as well as
+ *	qsort.c. qsort_arg.c should be considered authoratative.
+ *
+ *	Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ *	Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *	src/include/utils/template_qsort_arg.h
+ *-------------------------------------------------------------------------
+ */
+
+#include "c.h"
+
+#define swapcode(TYPE, parmi, parmj, n) \
+do {									\
+	size_t i = (n) / sizeof (TYPE);		\
+	TYPE *pi = (TYPE *)(void *)(parmi);	\
+	TYPE *pj = (TYPE *)(void *)(parmj);	\
+	do {								\
+		TYPE	t = *pi;				\
+		*pi++ = *pj;					\
+		*pj++ = t;						\
+		} while (--i > 0);				\
+} while (0)
+
+#define SWAPINIT(a, es) swaptype = ((char *)(a) - (char *)0) % sizeof(long) || \
+	(es) % sizeof(long) ? 2 : (es) == sizeof(long)? 0 : 1;
+
+#define thistype_vecswap(TYPE, a, b, n, SPEC_VAR)							\
+	if ((n) > 0) TYPE##SPEC_VAR##swapfunc((a), (b), (size_t)(n), swaptype) 	
+
+#define thistype_swap(TYPE, a, b, SPEC_VAR)			\
+if (swaptype == 0) {								\
+		long t = *(long *)(void *)(a);				\
+		*(long *)(void *)(a) = *(long *)(void *)(b);\
+		*(long *)(void *)(b) = t;					\
+	} else											\
+		TYPE##SPEC_VAR##swapfunc(a, b, es, swaptype)	
+
+/* 
+ * This macro manufactures a type-specific implementation of qsort_arg with
+ * the comparator, COMPAR, known at compile time. COMPAR is typically an
+ * inline function.
+ *
+ * COMPAR should take as its arguments two Datums, and return an int, in
+ * line with standard qsort convention.
+ * 
+ * We have void* parameters for TYPE##AppSort just to shut up the compiler.
+ * They could be SortTuple pointers instead, but that would make it more
+ * difficult to keep template_qsort_arg.h consistent with tuplesort.c.
+ *
+ */
+
+#define DO_TEMPLATE_QSORT_ARG(TYPE, COMPAR, SPEC_VAR, REG_ADDITIONAL_CODE)			\
+void TYPE##SPEC_VAR##qsort_arg(void *a, size_t n, size_t es, void *arg);			\
+																					\
+static inline int32																	\
+TYPE##SPEC_VAR##AppSort(const void *a, const void *b, Tuplesortstate *state)		\
+{																					\
+	int32		compare;															\
+	const SortTuple* aT = a;														\
+	const SortTuple* bT = b;														\
+																					\
+	/* Allow interrupting long sorts */												\
+	CHECK_FOR_INTERRUPTS();															\
+																					\
+	Assert(state->scanKeys);														\
+																					\
+	if ( aT->isnull1)																\
+	{																				\
+		if (bT->isnull1)															\
+			compare = 0;		/* NULL "=" NULL */									\
+		else if (state->scanKeys->sk_flags & SK_BT_NULLS_FIRST)						\
+			compare = -1;		/* NULL "<" NOT_NULL */								\
+		else																		\
+			compare = 1;		/* NULL ">" NOT_NULL */								\
+	}																				\
+	else if (bT->isnull1)															\
+	{																				\
+		if (state->scanKeys->sk_flags & SK_BT_NULLS_FIRST)							\
+			compare = 1;		/* NOT_NULL ">" NULL */								\
+		else																		\
+			compare = -1;		/* NOT_NULL "<" NULL */								\
+	}																				\
+	else																			\
+	{																				\
+		compare = COMPAR(aT->datum1, bT->datum1);									\
+																					\
+		if (state->scanKeys->sk_flags & SK_BT_DESC)									\
+			compare = -compare;														\
+	}																				\
+	if (compare != 0) 																\
+		return compare;																\
+	/* Additional code for variants where we have more than one scanKey */ 			\
+	REG_ADDITIONAL_CODE 															\
+	return 0;																		\
+}																					\
+																					\
+static inline void													\
+TYPE##SPEC_VAR##swapfunc(char *a, char *b, size_t n, int swaptype)	\
+{																	\
+	if (swaptype <= 1)												\
+		swapcode(long, a, b, n); 									\
+	else															\
+		swapcode(char, a, b, n);									\
+}																	\
+																	\
+static inline char *														\
+TYPE##SPEC_VAR##med3(char *a, char *b, char *c, void *arg)					\
+{																			\
+	return TYPE##SPEC_VAR##AppSort(a, b, arg) < 0 ?							\
+		(TYPE##SPEC_VAR##AppSort(b, c, arg) < 0 ? 							\
+					b : (TYPE##SPEC_VAR##AppSort(a, c, arg) < 0 ? c : a))	\
+		: (TYPE##SPEC_VAR##AppSort(b, c, arg) > 0 ? 						\
+					b : (TYPE##SPEC_VAR##AppSort(a, c, arg) < 0 ? a : c));	\
+}																			\
+																			\
+void 																	\
+TYPE##SPEC_VAR##qsort_arg(void *a, size_t n, size_t es, void *arg)		\
+{																		\
+	char	   *pa,														\
+			   *pb,														\
+			   *pc,														\
+			   *pd,														\
+			   *pl,														\
+			   *pm,														\
+			   *pn;														\
+	int			d,														\
+				r,														\
+				swaptype,												\
+				presorted;												\
+																		\
+loop:SWAPINIT(a, es); 													\
+	if (n < 7)															\
+	{																	\
+		for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es)	\
+			for (pl = pm; pl > (char *) a && 							\
+					TYPE##SPEC_VAR##AppSort(pl - es, pl, arg) > 0;		\
+				 pl -= es)												\
+				thistype_swap(TYPE, pl, pl - es, SPEC_VAR);				\
+		return;															\
+	}																	\
+	presorted = 1;														\
+	for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es)		\
+	{																	\
+		if (TYPE##SPEC_VAR##AppSort(pm - es, pm, arg) > 0)				\
+		{																\
+			presorted = 0;												\
+			break;														\
+		}																\
+	}																	\
+	if (presorted)														\
+		return;															\
+	pm = (char *) a + (n / 2) * es;										\
+	if (n > 7)															\
+	{																	\
+		pl = (char *) a;												\
+		pn = (char *) a + (n - 1) * es;									\
+		if (n > 40)														\
+		{																\
+			d = (n / 8) * es;											\
+			pl = TYPE##SPEC_VAR##med3(pl, pl + d, pl + 2 * d, arg);		\
+			pm = TYPE##SPEC_VAR##med3(pm - d, pm, pm + d, arg);			\
+			pn = TYPE##SPEC_VAR##med3(pn - 2 * d, pn - d, pn, arg);		\
+		}																\
+		pm = TYPE##SPEC_VAR##med3(pl, pm, pn, arg);						\
+	}																	\
+	thistype_swap(TYPE, a, pm, SPEC_VAR);								\
+	pa = pb = (char *) a + es;											\
+	pc = pd = (char *) a + (n - 1) * es;								\
+	for (;;)															\
+	{																	\
+		while (pb <= pc && 												\
+					(r = TYPE##SPEC_VAR##AppSort(pb, a, arg)) <= 0)		\
+		{																\
+			if (r == 0)													\
+			{															\
+				thistype_swap(TYPE, pa, pb, SPEC_VAR);					\
+				pa += es;												\
+			}															\
+			pb += es;													\
+		}																\
+		while (pb <= pc && 												\
+				(r = TYPE##SPEC_VAR##AppSort(pc, a, arg)) >= 0)			\
+		{																\
+			if (r == 0)													\
+			{															\
+				thistype_swap(TYPE, pc, pd, SPEC_VAR);					\
+				pd -= es;												\
+			}															\
+			pc -= es;													\
+		}																\
+		if (pb > pc)													\
+			break;														\
+		thistype_swap(TYPE, pb, pc, SPEC_VAR);							\
+		pb += es;														\
+		pc -= es;														\
+	}																	\
+	pn = (char *) a + n * es;											\
+	r = Min(pa - (char *) a, pb - pa);									\
+	thistype_vecswap(TYPE, a, pb - r, r, SPEC_VAR);						\
+	r = Min(pd - pc, pn - pd - es);										\
+	thistype_vecswap(TYPE, pb, pn - r, r, SPEC_VAR);					\
+	if ((r = pb - pa) > es)												\
+		TYPE##SPEC_VAR##qsort_arg(a, r / es, es, arg);					\
+	if ((r = pd - pc) > es)												\
+	{																	\
+		/* Iterate rather than recurse to save stack space */			\
+		a = pn - r;														\
+		n = r / es;														\
+		goto loop;														\
+	}																	\
+}
+
+/*
+ * This code becomes part of the comparator meta-function for the "reg" 
+ * specialization variant of each datatype-specific specialization.
+ *
+ * For "reg", we can handle multiple scankeys, but the function generally
+ * will not be inlined. Inlining has been found to offer additional performance 
+ * benefits beyond eliding the regular SQL function-call machinery.
+ *
+ * We do not elide the regular SQL function-call machinery in the "reg" case
+ * for the second and subsequent scanKeys. The higher the cardinality of the
+ * first scanKey column, the less frequently we'll need to fall back.
+ */ 
+
+#define REG_ADDITIONAL_CODE 														\
+{												 									\
+	ScanKey		scanKey = state->scanKeys;											\
+	HeapTupleData ltup;																\
+	HeapTupleData rtup;																\
+	TupleDesc	tupDesc;															\
+	int			nkey;																\
+																					\
+	Assert(state->nKeys > 1);														\
+	/* Compare additional sort keys */												\
+	ltup.t_len = ((MinimalTuple) aT->tuple)->t_len + MINIMAL_TUPLE_OFFSET;			\
+	ltup.t_data = (HeapTupleHeader) ((char *) aT->tuple - MINIMAL_TUPLE_OFFSET);	\
+	rtup.t_len = ((MinimalTuple) bT->tuple)->t_len + MINIMAL_TUPLE_OFFSET;			\
+	rtup.t_data = (HeapTupleHeader) ((char *) bT->tuple - MINIMAL_TUPLE_OFFSET);	\
+	tupDesc = state->tupDesc;														\
+	scanKey++;																		\
+	for (nkey = 1; nkey < state->nKeys; nkey++, scanKey++)							\
+	{																				\
+		AttrNumber	attno = scanKey->sk_attno;										\
+		Datum		datum1,															\
+					datum2;															\
+		bool		isnull1,														\
+					isnull2;														\
+																					\
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);						\
+		datum2 = heap_getattr(&rtup, attno, tupDesc, &isnull2);						\
+																					\
+		compare = inlineApplySortFunction(&scanKey->sk_func,						\
+										  scanKey->sk_flags,						\
+										  scanKey->sk_collation,					\
+										  datum1, isnull1,							\
+										  datum2, isnull2);							\
+		if (compare != 0)															\
+			return compare;															\
+	}																				\
+}
+#define INL_ADDITIONAL_CODE Assert(state->nKeys == 1);
+/* 
+ * Manufacture inlining variant for nKeys=1 case, and non-inlining variant
+ * for nKeys > 1 case
+ */	
+#define TEMPLATE_QSORT_ARG(TYPE, COMPAR)						\
+DO_TEMPLATE_QSORT_ARG(TYPE, COMPAR, inl, INL_ADDITIONAL_CODE)	\
+DO_TEMPLATE_QSORT_ARG(TYPE, COMPAR, reg, REG_ADDITIONAL_CODE)	\
+
diff --git a/src/port/qsort.c b/src/port/qsort.c
index 8e2c6d9..c949c8e 100644
--- a/src/port/qsort.c
+++ b/src/port/qsort.c
@@ -7,7 +7,8 @@
  *	  Remove ill-considered "swap_cnt" switch to insertion sort,
  *	  in favor of a simple check for presorted input.
  *
- *	CAUTION: if you change this file, see also qsort_arg.c
+ *	CAUTION: if you change this file, see also qsort_arg.c and 
+ *	template_qsort_arg.h
  *
  *	src/port/qsort.c
  */
-- 
1.7.7.3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

Reply via email to