Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Peter Geoghegan Thu, 17 Dec 2015 20:44:53 -0800

On Wed, Dec 16, 2015 at 12:04 PM, Peter Geoghegan <p...@heroku.com> wrote:
>> What kind of state is that?  Can't we define this in terms of what it
>> is rather than how it gets that way?
>
> It's zeroed.
>
> I guess we can update everything, including existing comments, to reflect 
> that.


Attached revision updates both the main commit (the optimization), and
the backpatch commit (updated the contract).

Changes:

* Changes commit intended for backpatch to 9.5 to use new "zeroed"
terminology (not "void" representation).

* Puts fix within mergerun() (for this patch) in the same backpatch
commit. We might as well keep this consistent, even though the
correctness of 9.5 does not actually hinge upon having this change --
there is zero risk of breaking something in 9.5, and avoiding
backpatching this part can only lead to confusion in the future.

* Some minor tweaks to tuplesort.c. Make sure that
tuplesort_putdatum() zeroes datum1 directly for NULL values. Comment
tweaks.

* Add comment to the datum case, discussing restriction there.

I was wrong when I said that the datum sort case currently tacitly
acknowledges as possible/useful something that the revised SortSupport
contract will forbid (I wrote that e-mail in haste).

What I propose to make the SortSupport contract forbid here is
abbreviated keys that are *themselves* pass-by-reference. It is true
that today, we do not support pass-by-value types with abbreviated
keys for the datum case only (such opclass support could be slightly
useful for float8, for example -- int8 comparisons of an alternative
representation might be decisively cheaper). But that existing
restriction is entirely distinct to the new restriction I propose to
create. It seems like this distinction should be pointed out in
passing, which I've done.

-- 
Peter Geoghegan

From c740eac2a4ba42ac2b5ab34c342534ff8d944666 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <peter.geoghega...@gmail.com>
Date: Mon, 6 Jul 2015 13:37:26 -0700
Subject: [PATCH 2/2] Reuse abbreviated keys in ordered [set] aggregates

When processing ordered aggregates following a sort that could make use
of the abbreviated key optimization, only call the equality operator to
compare successive pairs of tuples when their abbreviated keys were not
equal.  Only strict abbreviated key binary inequality is considered,
which is safe.
---
 src/backend/catalog/index.c            |  2 +-
 src/backend/executor/nodeAgg.c         | 20 ++++++++++++----
 src/backend/executor/nodeSort.c        |  2 +-
 src/backend/utils/adt/orderedsetaggs.c | 33 +++++++++++++++++--------
 src/backend/utils/sort/tuplesort.c     | 44 ++++++++++++++++++++++++++++------
 src/include/utils/tuplesort.h          |  4 ++--
 6 files changed, 79 insertions(+), 26 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c10be3d..c6737c2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3073,7 +3073,7 @@ validate_index_heapscan(Relation heapRelation,
 			}
 
 			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  &ts_val, &ts_isnull);
+												  &ts_val, &ts_isnull, NULL);
 			Assert(tuplesort_empty || !ts_isnull);
 			if (!tuplesort_empty)
 			{
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2e36855..2972180 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -539,7 +539,8 @@ fetch_input_tuple(AggState *aggstate)
 
 	if (aggstate->sort_in)
 	{
-		if (!tuplesort_gettupleslot(aggstate->sort_in, true, aggstate->sort_slot))
+		if (!tuplesort_gettupleslot(aggstate->sort_in, true,
+									aggstate->sort_slot, NULL))
 			return NULL;
 		slot = aggstate->sort_slot;
 	}
@@ -894,8 +895,8 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
  * The one-input case is handled separately from the multi-input case
  * for performance reasons: for single by-value inputs, such as the
  * common case of count(distinct id), the tuplesort_getdatum code path
- * is around 300% faster.  (The speedup for by-reference types is less
- * but still noticeable.)
+ * is around 300% faster.  (The speedup for by-reference types without
+ * abbreviated key support is less but still noticeable.)
  *
  * This function handles only one grouping set (already set in
  * aggstate->current_set).
@@ -913,6 +914,8 @@ process_ordered_aggregate_single(AggState *aggstate,
 	MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
 	MemoryContext oldContext;
 	bool		isDistinct = (pertrans->numDistinctCols > 0);
+	Datum		newAbbrevVal = (Datum) 0;
+	Datum		oldAbbrevVal = (Datum) 0;
 	FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
 	Datum	   *newVal;
 	bool	   *isNull;
@@ -932,7 +935,7 @@ process_ordered_aggregate_single(AggState *aggstate,
 	 */
 
 	while (tuplesort_getdatum(pertrans->sortstates[aggstate->current_set],
-							  true, newVal, isNull))
+							  true, newVal, isNull, &newAbbrevVal))
 	{
 		/*
 		 * Clear and select the working context for evaluation of the equality
@@ -950,6 +953,7 @@ process_ordered_aggregate_single(AggState *aggstate,
 			haveOldVal &&
 			((oldIsNull && *isNull) ||
 			 (!oldIsNull && !*isNull &&
+			  oldAbbrevVal == newAbbrevVal &&
 			  DatumGetBool(FunctionCall2(&pertrans->equalfns[0],
 										 oldVal, *newVal)))))
 		{
@@ -965,6 +969,7 @@ process_ordered_aggregate_single(AggState *aggstate,
 				pfree(DatumGetPointer(oldVal));
 			/* and remember the new one for subsequent equality checks */
 			oldVal = *newVal;
+			oldAbbrevVal = newAbbrevVal;
 			oldIsNull = *isNull;
 			haveOldVal = true;
 		}
@@ -1002,6 +1007,8 @@ process_ordered_aggregate_multi(AggState *aggstate,
 	TupleTableSlot *slot2 = pertrans->uniqslot;
 	int			numTransInputs = pertrans->numTransInputs;
 	int			numDistinctCols = pertrans->numDistinctCols;
+	Datum		newAbbrevVal = (Datum) 0;
+	Datum		oldAbbrevVal = (Datum) 0;
 	bool		haveOldValue = false;
 	int			i;
 
@@ -1012,7 +1019,7 @@ process_ordered_aggregate_multi(AggState *aggstate,
 		ExecClearTuple(slot2);
 
 	while (tuplesort_gettupleslot(pertrans->sortstates[aggstate->current_set],
-								  true, slot1))
+								  true, slot1, &newAbbrevVal))
 	{
 		/*
 		 * Extract the first numTransInputs columns as datums to pass to the
@@ -1023,6 +1030,7 @@ process_ordered_aggregate_multi(AggState *aggstate,
 
 		if (numDistinctCols == 0 ||
 			!haveOldValue ||
+			newAbbrevVal != oldAbbrevVal ||
 			!execTuplesMatch(slot1, slot2,
 							 numDistinctCols,
 							 pertrans->sortColIdx,
@@ -1046,6 +1054,8 @@ process_ordered_aggregate_multi(AggState *aggstate,
 
 				slot2 = slot1;
 				slot1 = tmpslot;
+				/* avoid execTuplesMatch() calls by reusing abbreviated keys */
+				oldAbbrevVal = newAbbrevVal;
 				haveOldValue = true;
 			}
 		}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index af1dccf..35dd993 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -137,7 +137,7 @@ ExecSort(SortState *node)
 	slot = node->ss.ps.ps_ResultTupleSlot;
 	(void) tuplesort_gettupleslot(tuplesortstate,
 								  ScanDirectionIsForward(dir),
-								  slot);
+								  slot, NULL);
 	return slot;
 }
 
diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index 39ed85b..1a5f696 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -453,7 +453,7 @@ percentile_disc_final(PG_FUNCTION_ARGS)
 			elog(ERROR, "missing row in percentile_disc");
 	}
 
-	if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull))
+	if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull, NULL))
 		elog(ERROR, "missing row in percentile_disc");
 
 	/*
@@ -553,7 +553,7 @@ percentile_cont_final_common(FunctionCallInfo fcinfo,
 	if (!tuplesort_skiptuples(osastate->sortstate, first_row, true))
 		elog(ERROR, "missing row in percentile_cont");
 
-	if (!tuplesort_getdatum(osastate->sortstate, true, &first_val, &isnull))
+	if (!tuplesort_getdatum(osastate->sortstate, true, &first_val, &isnull, NULL))
 		elog(ERROR, "missing row in percentile_cont");
 	if (isnull)
 		PG_RETURN_NULL();
@@ -564,7 +564,7 @@ percentile_cont_final_common(FunctionCallInfo fcinfo,
 	}
 	else
 	{
-		if (!tuplesort_getdatum(osastate->sortstate, true, &second_val, &isnull))
+		if (!tuplesort_getdatum(osastate->sortstate, true, &second_val, &isnull, NULL))
 			elog(ERROR, "missing row in percentile_cont");
 
 		if (isnull)
@@ -792,7 +792,7 @@ percentile_disc_multi_final(PG_FUNCTION_ARGS)
 				if (!tuplesort_skiptuples(osastate->sortstate, target_row - rownum - 1, true))
 					elog(ERROR, "missing row in percentile_disc");
 
-				if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull))
+				if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull, NULL))
 					elog(ERROR, "missing row in percentile_disc");
 
 				rownum = target_row;
@@ -921,7 +921,8 @@ percentile_cont_multi_final_common(FunctionCallInfo fcinfo,
 				if (!tuplesort_skiptuples(osastate->sortstate, first_row - rownum - 1, true))
 					elog(ERROR, "missing row in percentile_cont");
 
-				if (!tuplesort_getdatum(osastate->sortstate, true, &first_val, &isnull) || isnull)
+				if (!tuplesort_getdatum(osastate->sortstate, true, &first_val,
+										&isnull, NULL) || isnull)
 					elog(ERROR, "missing row in percentile_cont");
 
 				rownum = first_row;
@@ -941,7 +942,8 @@ percentile_cont_multi_final_common(FunctionCallInfo fcinfo,
 			/* Fetch second_row if needed */
 			if (second_row > rownum)
 			{
-				if (!tuplesort_getdatum(osastate->sortstate, true, &second_val, &isnull) || isnull)
+				if (!tuplesort_getdatum(osastate->sortstate, true, &second_val,
+										&isnull, NULL) || isnull)
 					elog(ERROR, "missing row in percentile_cont");
 				rownum++;
 			}
@@ -1016,6 +1018,8 @@ mode_final(PG_FUNCTION_ARGS)
 	int64		last_val_freq = 0;
 	bool		last_val_is_mode = false;
 	FmgrInfo   *equalfn;
+	Datum		abbrev_val = (Datum) 0;
+	Datum		last_abbrev_val = (Datum) 0;
 	bool		shouldfree;
 
 	Assert(AggCheckCallContext(fcinfo, NULL) == AGG_CONTEXT_AGGREGATE);
@@ -1042,7 +1046,7 @@ mode_final(PG_FUNCTION_ARGS)
 	tuplesort_performsort(osastate->sortstate);
 
 	/* Scan tuples and count frequencies */
-	while (tuplesort_getdatum(osastate->sortstate, true, &val, &isnull))
+	while (tuplesort_getdatum(osastate->sortstate, true, &val, &isnull, &abbrev_val))
 	{
 		/* we don't expect any nulls, but ignore them if found */
 		if (isnull)
@@ -1054,8 +1058,10 @@ mode_final(PG_FUNCTION_ARGS)
 			mode_val = last_val = val;
 			mode_freq = last_val_freq = 1;
 			last_val_is_mode = true;
+			last_abbrev_val = abbrev_val;
 		}
-		else if (DatumGetBool(FunctionCall2(equalfn, val, last_val)))
+		else if (abbrev_val == last_abbrev_val &&
+				 DatumGetBool(FunctionCall2(equalfn, val, last_val)))
 		{
 			/* value equal to previous value, count it */
 			if (last_val_is_mode)
@@ -1078,6 +1084,8 @@ mode_final(PG_FUNCTION_ARGS)
 			if (shouldfree && !last_val_is_mode)
 				pfree(DatumGetPointer(last_val));
 			last_val = val;
+			/* avoid equality function calls by reusing abbreviated keys */
+			last_abbrev_val = abbrev_val;
 			last_val_freq = 1;
 			last_val_is_mode = false;
 		}
@@ -1181,7 +1189,7 @@ hypothetical_rank_common(FunctionCallInfo fcinfo, int flag,
 	tuplesort_performsort(osastate->sortstate);
 
 	/* iterate till we find the hypothetical row */
-	while (tuplesort_gettupleslot(osastate->sortstate, true, slot))
+	while (tuplesort_gettupleslot(osastate->sortstate, true, slot, NULL))
 	{
 		bool		isnull;
 		Datum		d = slot_getattr(slot, nargs + 1, &isnull);
@@ -1266,6 +1274,8 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 	int64		duplicate_count = 0;
 	OSAPerGroupState *osastate;
 	int			numDistinctCols;
+	Datum		abbrevVal = (Datum) 0;
+	Datum		abbrevOld = (Datum) 0;
 	AttrNumber *sortColIdx;
 	FmgrInfo   *equalfns;
 	TupleTableSlot *slot;
@@ -1342,7 +1352,7 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 	slot2 = extraslot;
 
 	/* iterate till we find the hypothetical row */
-	while (tuplesort_gettupleslot(osastate->sortstate, true, slot))
+	while (tuplesort_gettupleslot(osastate->sortstate, true, slot, &abbrevVal))
 	{
 		bool		isnull;
 		Datum		d = slot_getattr(slot, nargs + 1, &isnull);
@@ -1353,6 +1363,7 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 
 		/* count non-distinct tuples */
 		if (!TupIsNull(slot2) &&
+			abbrevVal == abbrevOld &&
 			execTuplesMatch(slot, slot2,
 							numDistinctCols,
 							sortColIdx,
@@ -1363,6 +1374,8 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 		tmpslot = slot2;
 		slot2 = slot;
 		slot = tmpslot;
+		/* avoid execTuplesMatch() calls by reusing abbreviated keys */
+		abbrevOld = abbrevVal;
 
 		rank++;
 
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5e12c0b..4c7c2b1 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1282,7 +1282,8 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to zeroed representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup.datum1 = original;
 	}
@@ -1351,7 +1352,11 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
 
 	if (isNull || state->datumTypeByVal)
 	{
-		stup.datum1 = val;
+		/*
+		 * Set datum1 to zeroed representation for NULLs (to be consistent, and
+		 * to support cheap inequality tests for NULL abbreviated keys).
+		 */
+		stup.datum1 = !isNull ? val : (Datum) 0;
 		stup.isnull1 = isNull;
 		stup.tuple = NULL;		/* no separate storage */
 	}
@@ -1868,10 +1873,17 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
  * Fetch the next tuple in either forward or back direction.
  * If successful, put tuple in slot and return TRUE; else, clear the slot
  * and return FALSE.
+ *
+ * Caller may optionally be passed back abbreviated value (on TRUE return
+ * value) when abbreviation was used, which can be used to cheaply avoid
+ * equality checks that might otherwise be required.  Caller can safely make a
+ * determination of "non-equal tuple" based on simple binary inequality.  A
+ * NULL value in leading attribute will set abbreviated value to zeroed
+ * representation, which caller may rely on in abbreviated inequality check.
  */
 bool
 tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
-					   TupleTableSlot *slot)
+					   TupleTableSlot *slot, Datum *abbrev)
 {
 	MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
 	SortTuple	stup;
@@ -1884,6 +1896,10 @@ tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
 
 	if (stup.tuple)
 	{
+		/* Record abbreviated key for caller */
+		if (state->sortKeys->abbrev_converter && abbrev)
+			*abbrev = stup.datum1;
+
 		ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, should_free);
 		return true;
 	}
@@ -1939,10 +1955,17 @@ tuplesort_getindextuple(Tuplesortstate *state, bool forward,
  *
  * If the Datum is pass-by-ref type, the returned value is freshly palloc'd
  * and is now owned by the caller.
+ *
+ * Caller may optionally be passed back abbreviated value (on TRUE return
+ * value) when abbreviation was used, which can be used to cheaply avoid
+ * equality checks that might otherwise be required.  Caller can safely make a
+ * determination of "non-equal tuple" based on simple binary inequality.  A
+ * NULL value will have a zeroed abbreviated value representation, which caller
+ * may rely on in abbreviated inequality check.
  */
 bool
 tuplesort_getdatum(Tuplesortstate *state, bool forward,
-				   Datum *val, bool *isNull)
+				   Datum *val, bool *isNull, Datum *abbrev)
 {
 	MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
 	SortTuple	stup;
@@ -1954,6 +1977,10 @@ tuplesort_getdatum(Tuplesortstate *state, bool forward,
 		return false;
 	}
 
+	/* Record abbreviated key for caller */
+	if (state->sortKeys->abbrev_converter && abbrev)
+		*abbrev = stup.datum1;
+
 	if (stup.isnull1 || state->datumTypeByVal)
 	{
 		*val = stup.datum1;
@@ -3166,7 +3193,8 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to zeroed representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup->datum1 = original;
 	}
@@ -3408,7 +3436,8 @@ copytup_cluster(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to zeroed representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup->datum1 = original;
 	}
@@ -3712,7 +3741,8 @@ copytup_index(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to zeroed representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup->datum1 = original;
 	}
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index de6fc56..9af1159 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -93,13 +93,13 @@ extern void tuplesort_putdatum(Tuplesortstate *state, Datum val,
 extern void tuplesort_performsort(Tuplesortstate *state);
 
 extern bool tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
-					   TupleTableSlot *slot);
+					   TupleTableSlot *slot, Datum *abbrev);
 extern HeapTuple tuplesort_getheaptuple(Tuplesortstate *state, bool forward,
 					   bool *should_free);
 extern IndexTuple tuplesort_getindextuple(Tuplesortstate *state, bool forward,
 						bool *should_free);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward,
-				   Datum *val, bool *isNull);
+				   Datum *val, bool *isNull, Datum *abbrev);
 
 extern bool tuplesort_skiptuples(Tuplesortstate *state, int64 ntuples,
 					 bool forward);
-- 
1.9.1

From 1fb207b7dfdba7265b2a7806416b9cf384931f4d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <peter.geoghega...@gmail.com>
Date: Thu, 9 Jul 2015 15:59:07 -0700
Subject: [PATCH 1/2] Tighten SortSupport abbreviated key contract

Revise the contract that SortSupport has with its clients to forbid an
abbreviated comparator ever returning 0 ("indeterminate") in the event
of two abbreviated keys not being bitwise equal.  This gives certain
cases the leeway to inexpensively determine tuple inequality from afar,
by comparing abbreviated keys using simple unsigned integer (Datum)
comparisons, and trusting that bitwise inequality amounts to leading
attribute inequality (and, therefore, tuple inequality).  Direct access
to SortSupport state is consequently not required, which will be
convenient for tuplesort clients (that only have an opaque state
pointer), while also saving us a few additional cycles.

Cheap inequality checks of this nature might otherwise give wrong
answers due (for example) to some opclass abbreviated comparator
interpreting non-identical abbreviated keys with respect to some state
built during memtuple copying (although that is probably still safe,
depending on the exact details).  Pass-by-reference abbreviated keys
could similarly cause problems, and so are now forbidden.  No existing
opclass with abbreviation support does anything like this, so no change
in any opclass SortSupport routine.

Finally, move code that handles the "one initial run" external sort
optimization in mergeruns() to occur immediately after (not immediately
before) SortSupport state has been set to be consistent with
abbreviation not being in play (this is needed because abbreviated keys
are never stored on tape).  The existing order of those two things is
not significant, but only because the question of further tuple
comparisons never arises with the "one initial run" external sort
optimization.  The question of cheap inequality tests can be expected to
arise in a future commit, though.  This is also tidier.

Backpatch to 9.5, where abbreviated keys were introduced.
---
 src/backend/utils/sort/tuplesort.c | 52 +++++++++++++++++++++++---------------
 src/include/utils/sortsupport.h    | 16 +++++++++++-
 2 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6572af8..5e12c0b 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -930,7 +930,17 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
 	state->sortKeys->ssup_collation = sortCollation;
 	state->sortKeys->ssup_nulls_first = nullsFirstFlag;
 
-	/* abbreviation is possible here only for by-reference types */
+	/*
+	 * Abbreviation is possible here only for by-reference types.  Note that a
+	 * pass-by-value representation for abbreviated values is forbidden, but
+	 * that's a distinct, generic restriction imposed by the SortSupport
+	 * contract.
+	 *
+	 * It's possible for a pass-by-value type to have a pass-by-value
+	 * abbreviated key representation, presumably with cheaper comparisons, but
+	 * that could never be used here.  Abbreviated keys are mostly about cache
+	 * performance, so this doesn't seem like a serious restriction.
+	 */
 	state->sortKeys->abbreviate = !typbyval;
 
 	PrepareSortSupportFromOrderingOp(sortOperator, state->sortKeys);
@@ -1271,7 +1281,7 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
 		 * Store ordinary Datum representation, or NULL value.  If there is a
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
-		 * converter and just set datum1 to "void" representation (to be
+		 * converter and just set datum1 to zeroed representation (to be
 		 * consistent).
 		 */
 		stup.datum1 = original;
@@ -2224,6 +2234,22 @@ mergeruns(Tuplesortstate *state)
 	Assert(state->status == TSS_BUILDRUNS);
 	Assert(state->memtupcount == 0);
 
+	if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)
+	{
+		/*
+		 * If there are multiple runs to be merged, when we go to read back
+		 * tuples from disk, abbreviated keys will not have been stored, and
+		 * we don't care to regenerate them.  Disable abbreviation from this
+		 * point on.
+		 */
+		state->sortKeys->abbrev_converter = NULL;
+		state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;
+
+		/* Not strictly necessary, but be tidy */
+		state->sortKeys->abbrev_abort = NULL;
+		state->sortKeys->abbrev_full_comparator = NULL;
+	}
+
 	/*
 	 * If we produced only one initial run (quite likely if the total data
 	 * volume is between 1X and 2X workMem), we can just use that tape as the
@@ -2239,22 +2265,6 @@ mergeruns(Tuplesortstate *state)
 		return;
 	}
 
-	if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)
-	{
-		/*
-		 * If there are multiple runs to be merged, when we go to read back
-		 * tuples from disk, abbreviated keys will not have been stored, and
-		 * we don't care to regenerate them.  Disable abbreviation from this
-		 * point on.
-		 */
-		state->sortKeys->abbrev_converter = NULL;
-		state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;
-
-		/* Not strictly necessary, but be tidy */
-		state->sortKeys->abbrev_abort = NULL;
-		state->sortKeys->abbrev_full_comparator = NULL;
-	}
-
 	/* End of step D2: rewind all output tapes to prepare for merging */
 	for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
 		LogicalTapeRewind(state->tapeset, tapenum, false);
@@ -3155,7 +3165,7 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * Store ordinary Datum representation, or NULL value.  If there is a
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
-		 * converter and just set datum1 to "void" representation (to be
+		 * converter and just set datum1 to zeroed representation (to be
 		 * consistent).
 		 */
 		stup->datum1 = original;
@@ -3397,7 +3407,7 @@ copytup_cluster(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * Store ordinary Datum representation, or NULL value.  If there is a
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
-		 * converter and just set datum1 to "void" representation (to be
+		 * converter and just set datum1 to zeroed representation (to be
 		 * consistent).
 		 */
 		stup->datum1 = original;
@@ -3701,7 +3711,7 @@ copytup_index(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * Store ordinary Datum representation, or NULL value.  If there is a
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
-		 * converter and just set datum1 to "void" representation (to be
+		 * converter and just set datum1 to zeroed representation (to be
 		 * consistent).
 		 */
 		stup->datum1 = original;
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 4258630..5a110f1 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -116,7 +116,7 @@ typedef struct SortSupportData
 	 *
 	 * This allows opclass authors to supply a conversion routine, used to
 	 * create an alternative representation of the underlying type (an
-	 * "abbreviated key").  Typically, this representation is an ad-hoc,
+	 * "abbreviated key").  This representation must have an ad-hoc,
 	 * pass-by-value Datum format that only the opclass has knowledge of.  An
 	 * alternative comparator, used only with this alternative representation
 	 * must also be provided (which is assigned to "comparator").  This
@@ -135,6 +135,20 @@ typedef struct SortSupportData
 	 * cache miss penalties are expensive; to get good overall performance,
 	 * sort infrastructure must heavily weigh cache performance.
 	 *
+	 * There is one limited way in which the core code has knowledge of any
+	 * abbreviated key format:  it uses it directly for inequality.
+	 * Specifically, it assumes that two non-bitwise-identical abbreviated keys
+	 * could never have their abbreviated comparator return a 0 or
+	 * "inconclusive" result; in other words, it assumes that abbreviated
+	 * bitwise inequality is a reliable proxy for authoritative value
+	 * inequality.  This restriction enables cheap inequality tests made from a
+	 * distance, without opclass involvement.  Furthermore, it implies that
+	 * abbreviated keys themselves must always be pass-by-value, since this
+	 * won't work with pointers.  Note that it is still permitted for the
+	 * abbreviated comparator to use state generated during calls to
+	 * abbrev_converter -- for example, a lookup table could be used, provided
+	 * any identifier baked into the abbreviated representation is canonical.
+	 *
 	 * Opclass authors must consider the final cardinality of abbreviated keys
 	 * when devising an encoding scheme.  It's possible for a strategy to work
 	 * better than an alternative strategy with one usage pattern, while the
-- 
1.9.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Reply via email to