Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Peter Geoghegan Sat, 10 Oct 2015 18:03:53 -0700

On Fri, Sep 25, 2015 at 2:39 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> This needs a rebase, there are several conflicts in 
> src/backend/executor/nodeAgg.c


I attached a revised version of the second patch in the series, fixing
this bitrot.

I also noticed a bug in tuplesort's TSS_SORTEDONTAPE case with the
previous patch, where no final on-the-fly merge step is required (no
merge step is required whatsoever, because replacement selection
managed to produce only one run). The function mergeruns() previously
only "abandoned" abbreviated ahead of any merge step iff there was
one. When there was only one run (not requiring a merge) it happened
to continue to keep its state consistent with abbreviated keys still
being in use. It didn't matter before, because abbreviated keys were
only for tuplesort to compare, but that's different now.

That bug is fixed in this revision by reordering things within
mergeruns(). The previous order of the two things that were switched
is not at all significant (I should know, I wrote that code).

Thanks
-- 
Peter Geoghegan

From a82f0f723e1f5206d9c19b1b277acc0625aa54b1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <peter.geoghega...@gmail.com>
Date: Mon, 6 Jul 2015 13:37:26 -0700
Subject: [PATCH 2/2] Reuse abbreviated keys in ordered [set] aggregates

When processing ordered aggregates following a sort that could make use
of the abbreviated key optimization, only call the equality operator to
compare successive pairs of tuples when their abbreviated keys were not
equal.  Only strict abbreviated key binary inequality is considered,
which is safe.
---
 src/backend/catalog/index.c            |  2 +-
 src/backend/executor/nodeAgg.c         | 20 ++++++---
 src/backend/executor/nodeSort.c        |  2 +-
 src/backend/utils/adt/orderedsetaggs.c | 33 ++++++++++-----
 src/backend/utils/sort/tuplesort.c     | 74 +++++++++++++++++++++++-----------
 src/include/utils/tuplesort.h          |  4 +-
 6 files changed, 93 insertions(+), 42 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e59b163..bb61018 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3024,7 +3024,7 @@ validate_index_heapscan(Relation heapRelation,
 			}
 
 			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  &ts_val, &ts_isnull);
+												  &ts_val, &ts_isnull, NULL);
 			Assert(tuplesort_empty || !ts_isnull);
 			indexcursor = (ItemPointer) DatumGetPointer(ts_val);
 		}
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2e36855..2972180 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -539,7 +539,8 @@ fetch_input_tuple(AggState *aggstate)
 
 	if (aggstate->sort_in)
 	{
-		if (!tuplesort_gettupleslot(aggstate->sort_in, true, aggstate->sort_slot))
+		if (!tuplesort_gettupleslot(aggstate->sort_in, true,
+									aggstate->sort_slot, NULL))
 			return NULL;
 		slot = aggstate->sort_slot;
 	}
@@ -894,8 +895,8 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
  * The one-input case is handled separately from the multi-input case
  * for performance reasons: for single by-value inputs, such as the
  * common case of count(distinct id), the tuplesort_getdatum code path
- * is around 300% faster.  (The speedup for by-reference types is less
- * but still noticeable.)
+ * is around 300% faster.  (The speedup for by-reference types without
+ * abbreviated key support is less but still noticeable.)
  *
  * This function handles only one grouping set (already set in
  * aggstate->current_set).
@@ -913,6 +914,8 @@ process_ordered_aggregate_single(AggState *aggstate,
 	MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
 	MemoryContext oldContext;
 	bool		isDistinct = (pertrans->numDistinctCols > 0);
+	Datum		newAbbrevVal = (Datum) 0;
+	Datum		oldAbbrevVal = (Datum) 0;
 	FunctionCallInfo fcinfo = &pertrans->transfn_fcinfo;
 	Datum	   *newVal;
 	bool	   *isNull;
@@ -932,7 +935,7 @@ process_ordered_aggregate_single(AggState *aggstate,
 	 */
 
 	while (tuplesort_getdatum(pertrans->sortstates[aggstate->current_set],
-							  true, newVal, isNull))
+							  true, newVal, isNull, &newAbbrevVal))
 	{
 		/*
 		 * Clear and select the working context for evaluation of the equality
@@ -950,6 +953,7 @@ process_ordered_aggregate_single(AggState *aggstate,
 			haveOldVal &&
 			((oldIsNull && *isNull) ||
 			 (!oldIsNull && !*isNull &&
+			  oldAbbrevVal == newAbbrevVal &&
 			  DatumGetBool(FunctionCall2(&pertrans->equalfns[0],
 										 oldVal, *newVal)))))
 		{
@@ -965,6 +969,7 @@ process_ordered_aggregate_single(AggState *aggstate,
 				pfree(DatumGetPointer(oldVal));
 			/* and remember the new one for subsequent equality checks */
 			oldVal = *newVal;
+			oldAbbrevVal = newAbbrevVal;
 			oldIsNull = *isNull;
 			haveOldVal = true;
 		}
@@ -1002,6 +1007,8 @@ process_ordered_aggregate_multi(AggState *aggstate,
 	TupleTableSlot *slot2 = pertrans->uniqslot;
 	int			numTransInputs = pertrans->numTransInputs;
 	int			numDistinctCols = pertrans->numDistinctCols;
+	Datum		newAbbrevVal = (Datum) 0;
+	Datum		oldAbbrevVal = (Datum) 0;
 	bool		haveOldValue = false;
 	int			i;
 
@@ -1012,7 +1019,7 @@ process_ordered_aggregate_multi(AggState *aggstate,
 		ExecClearTuple(slot2);
 
 	while (tuplesort_gettupleslot(pertrans->sortstates[aggstate->current_set],
-								  true, slot1))
+								  true, slot1, &newAbbrevVal))
 	{
 		/*
 		 * Extract the first numTransInputs columns as datums to pass to the
@@ -1023,6 +1030,7 @@ process_ordered_aggregate_multi(AggState *aggstate,
 
 		if (numDistinctCols == 0 ||
 			!haveOldValue ||
+			newAbbrevVal != oldAbbrevVal ||
 			!execTuplesMatch(slot1, slot2,
 							 numDistinctCols,
 							 pertrans->sortColIdx,
@@ -1046,6 +1054,8 @@ process_ordered_aggregate_multi(AggState *aggstate,
 
 				slot2 = slot1;
 				slot1 = tmpslot;
+				/* avoid execTuplesMatch() calls by reusing abbreviated keys */
+				oldAbbrevVal = newAbbrevVal;
 				haveOldValue = true;
 			}
 		}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index af1dccf..35dd993 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -137,7 +137,7 @@ ExecSort(SortState *node)
 	slot = node->ss.ps.ps_ResultTupleSlot;
 	(void) tuplesort_gettupleslot(tuplesortstate,
 								  ScanDirectionIsForward(dir),
-								  slot);
+								  slot, NULL);
 	return slot;
 }
 
diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index 39ed85b..1a5f696 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -453,7 +453,7 @@ percentile_disc_final(PG_FUNCTION_ARGS)
 			elog(ERROR, "missing row in percentile_disc");
 	}
 
-	if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull))
+	if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull, NULL))
 		elog(ERROR, "missing row in percentile_disc");
 
 	/*
@@ -553,7 +553,7 @@ percentile_cont_final_common(FunctionCallInfo fcinfo,
 	if (!tuplesort_skiptuples(osastate->sortstate, first_row, true))
 		elog(ERROR, "missing row in percentile_cont");
 
-	if (!tuplesort_getdatum(osastate->sortstate, true, &first_val, &isnull))
+	if (!tuplesort_getdatum(osastate->sortstate, true, &first_val, &isnull, NULL))
 		elog(ERROR, "missing row in percentile_cont");
 	if (isnull)
 		PG_RETURN_NULL();
@@ -564,7 +564,7 @@ percentile_cont_final_common(FunctionCallInfo fcinfo,
 	}
 	else
 	{
-		if (!tuplesort_getdatum(osastate->sortstate, true, &second_val, &isnull))
+		if (!tuplesort_getdatum(osastate->sortstate, true, &second_val, &isnull, NULL))
 			elog(ERROR, "missing row in percentile_cont");
 
 		if (isnull)
@@ -792,7 +792,7 @@ percentile_disc_multi_final(PG_FUNCTION_ARGS)
 				if (!tuplesort_skiptuples(osastate->sortstate, target_row - rownum - 1, true))
 					elog(ERROR, "missing row in percentile_disc");
 
-				if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull))
+				if (!tuplesort_getdatum(osastate->sortstate, true, &val, &isnull, NULL))
 					elog(ERROR, "missing row in percentile_disc");
 
 				rownum = target_row;
@@ -921,7 +921,8 @@ percentile_cont_multi_final_common(FunctionCallInfo fcinfo,
 				if (!tuplesort_skiptuples(osastate->sortstate, first_row - rownum - 1, true))
 					elog(ERROR, "missing row in percentile_cont");
 
-				if (!tuplesort_getdatum(osastate->sortstate, true, &first_val, &isnull) || isnull)
+				if (!tuplesort_getdatum(osastate->sortstate, true, &first_val,
+										&isnull, NULL) || isnull)
 					elog(ERROR, "missing row in percentile_cont");
 
 				rownum = first_row;
@@ -941,7 +942,8 @@ percentile_cont_multi_final_common(FunctionCallInfo fcinfo,
 			/* Fetch second_row if needed */
 			if (second_row > rownum)
 			{
-				if (!tuplesort_getdatum(osastate->sortstate, true, &second_val, &isnull) || isnull)
+				if (!tuplesort_getdatum(osastate->sortstate, true, &second_val,
+										&isnull, NULL) || isnull)
 					elog(ERROR, "missing row in percentile_cont");
 				rownum++;
 			}
@@ -1016,6 +1018,8 @@ mode_final(PG_FUNCTION_ARGS)
 	int64		last_val_freq = 0;
 	bool		last_val_is_mode = false;
 	FmgrInfo   *equalfn;
+	Datum		abbrev_val = (Datum) 0;
+	Datum		last_abbrev_val = (Datum) 0;
 	bool		shouldfree;
 
 	Assert(AggCheckCallContext(fcinfo, NULL) == AGG_CONTEXT_AGGREGATE);
@@ -1042,7 +1046,7 @@ mode_final(PG_FUNCTION_ARGS)
 	tuplesort_performsort(osastate->sortstate);
 
 	/* Scan tuples and count frequencies */
-	while (tuplesort_getdatum(osastate->sortstate, true, &val, &isnull))
+	while (tuplesort_getdatum(osastate->sortstate, true, &val, &isnull, &abbrev_val))
 	{
 		/* we don't expect any nulls, but ignore them if found */
 		if (isnull)
@@ -1054,8 +1058,10 @@ mode_final(PG_FUNCTION_ARGS)
 			mode_val = last_val = val;
 			mode_freq = last_val_freq = 1;
 			last_val_is_mode = true;
+			last_abbrev_val = abbrev_val;
 		}
-		else if (DatumGetBool(FunctionCall2(equalfn, val, last_val)))
+		else if (abbrev_val == last_abbrev_val &&
+				 DatumGetBool(FunctionCall2(equalfn, val, last_val)))
 		{
 			/* value equal to previous value, count it */
 			if (last_val_is_mode)
@@ -1078,6 +1084,8 @@ mode_final(PG_FUNCTION_ARGS)
 			if (shouldfree && !last_val_is_mode)
 				pfree(DatumGetPointer(last_val));
 			last_val = val;
+			/* avoid equality function calls by reusing abbreviated keys */
+			last_abbrev_val = abbrev_val;
 			last_val_freq = 1;
 			last_val_is_mode = false;
 		}
@@ -1181,7 +1189,7 @@ hypothetical_rank_common(FunctionCallInfo fcinfo, int flag,
 	tuplesort_performsort(osastate->sortstate);
 
 	/* iterate till we find the hypothetical row */
-	while (tuplesort_gettupleslot(osastate->sortstate, true, slot))
+	while (tuplesort_gettupleslot(osastate->sortstate, true, slot, NULL))
 	{
 		bool		isnull;
 		Datum		d = slot_getattr(slot, nargs + 1, &isnull);
@@ -1266,6 +1274,8 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 	int64		duplicate_count = 0;
 	OSAPerGroupState *osastate;
 	int			numDistinctCols;
+	Datum		abbrevVal = (Datum) 0;
+	Datum		abbrevOld = (Datum) 0;
 	AttrNumber *sortColIdx;
 	FmgrInfo   *equalfns;
 	TupleTableSlot *slot;
@@ -1342,7 +1352,7 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 	slot2 = extraslot;
 
 	/* iterate till we find the hypothetical row */
-	while (tuplesort_gettupleslot(osastate->sortstate, true, slot))
+	while (tuplesort_gettupleslot(osastate->sortstate, true, slot, &abbrevVal))
 	{
 		bool		isnull;
 		Datum		d = slot_getattr(slot, nargs + 1, &isnull);
@@ -1353,6 +1363,7 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 
 		/* count non-distinct tuples */
 		if (!TupIsNull(slot2) &&
+			abbrevVal == abbrevOld &&
 			execTuplesMatch(slot, slot2,
 							numDistinctCols,
 							sortColIdx,
@@ -1363,6 +1374,8 @@ hypothetical_dense_rank_final(PG_FUNCTION_ARGS)
 		tmpslot = slot2;
 		slot2 = slot;
 		slot = tmpslot;
+		/* avoid execTuplesMatch() calls by reusing abbreviated keys */
+		abbrevOld = abbrevVal;
 
 		rank++;
 
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index d532e87..49cf4cc 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1272,7 +1272,8 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to "void" representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup.datum1 = original;
 	}
@@ -1334,7 +1335,9 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
 	 * stup.tuple and is treated as the canonical copy (e.g. to return via
 	 * tuplesort_getdatum or when writing to tape); stup.datum1 gets the
 	 * abbreviated value if abbreviation is happening, otherwise it's
-	 * identical to stup.tuple.
+	 * identical to stup.tuple. stup.datum1 has a "void" representation for
+	 * NULL values, which abbreviated keys require for cheap inequality
+	 * checks.
 	 */
 
 	if (isNull || state->datumTypeByVal)
@@ -1854,10 +1857,17 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
  * Fetch the next tuple in either forward or back direction.
  * If successful, put tuple in slot and return TRUE; else, clear the slot
  * and return FALSE.
+ *
+ * Caller may optionally be passed back abbreviated value (on TRUE return
+ * value) when abbreviation was used, which can be used to cheaply avoid
+ * equality checks that might otherwise be required.  Caller can safely make a
+ * determination of "non-equal tuple" based on simple binary inequality.  NULL
+ * value in leading attribute will set abbreviated value to "void" abbreviated
+ * representation, which caller may rely on in abbreviated inequality check.
  */
 bool
 tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
-					   TupleTableSlot *slot)
+					   TupleTableSlot *slot, Datum *abbrev)
 {
 	MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
 	SortTuple	stup;
@@ -1870,6 +1880,10 @@ tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
 
 	if (stup.tuple)
 	{
+		/* Record abbreviated key for caller */
+		if (state->sortKeys->abbrev_converter && abbrev)
+			*abbrev = stup.datum1;
+
 		ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, should_free);
 		return true;
 	}
@@ -1925,10 +1939,17 @@ tuplesort_getindextuple(Tuplesortstate *state, bool forward,
  *
  * If the Datum is pass-by-ref type, the returned value is freshly palloc'd
  * and is now owned by the caller.
+ *
+ * Caller may optionally be passed back abbreviated value (on TRUE return
+ * value) when abbreviation was used, which can be used to cheaply avoid
+ * equality checks that might otherwise be required.  Caller can safely make a
+ * determination of "non-equal tuple" based on simple binary inequality.  NULL
+ * value in leading attribute will set abbreviated value to "void" abbreviated
+ * representation, which caller may rely on in abbreviated inequality check.
  */
 bool
 tuplesort_getdatum(Tuplesortstate *state, bool forward,
-				   Datum *val, bool *isNull)
+				   Datum *val, bool *isNull, Datum *abbrev)
 {
 	MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
 	SortTuple	stup;
@@ -1940,6 +1961,10 @@ tuplesort_getdatum(Tuplesortstate *state, bool forward,
 		return false;
 	}
 
+	/* Record abbreviated key for caller */
+	if (state->sortKeys->abbrev_converter && abbrev)
+		*abbrev = stup.datum1;
+
 	if (stup.isnull1 || state->datumTypeByVal)
 	{
 		*val = stup.datum1;
@@ -2220,6 +2245,22 @@ mergeruns(Tuplesortstate *state)
 	Assert(state->status == TSS_BUILDRUNS);
 	Assert(state->memtupcount == 0);
 
+	if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)
+	{
+		/*
+		 * If there are multiple runs to be merged, when we go to read back
+		 * tuples from disk, abbreviated keys will not have been stored, and
+		 * we don't care to regenerate them.  Disable abbreviation from this
+		 * point on.
+		 */
+		state->sortKeys->abbrev_converter = NULL;
+		state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;
+
+		/* Not strictly necessary, but be tidy */
+		state->sortKeys->abbrev_abort = NULL;
+		state->sortKeys->abbrev_full_comparator = NULL;
+	}
+
 	/*
 	 * If we produced only one initial run (quite likely if the total data
 	 * volume is between 1X and 2X workMem), we can just use that tape as the
@@ -2235,22 +2276,6 @@ mergeruns(Tuplesortstate *state)
 		return;
 	}
 
-	if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)
-	{
-		/*
-		 * If there are multiple runs to be merged, when we go to read back
-		 * tuples from disk, abbreviated keys will not have been stored, and
-		 * we don't care to regenerate them.  Disable abbreviation from this
-		 * point on.
-		 */
-		state->sortKeys->abbrev_converter = NULL;
-		state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;
-
-		/* Not strictly necessary, but be tidy */
-		state->sortKeys->abbrev_abort = NULL;
-		state->sortKeys->abbrev_full_comparator = NULL;
-	}
-
 	/* End of step D2: rewind all output tapes to prepare for merging */
 	for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
 		LogicalTapeRewind(state->tapeset, tapenum, false);
@@ -3152,7 +3177,8 @@ copytup_heap(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to "void" representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup->datum1 = original;
 	}
@@ -3392,7 +3418,8 @@ copytup_cluster(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to "void" representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup->datum1 = original;
 	}
@@ -3694,7 +3721,8 @@ copytup_index(Tuplesortstate *state, SortTuple *stup, void *tup)
 		 * converter it won't expect NULL values, and cost model is not
 		 * required to account for NULL, so in that case we avoid calling
 		 * converter and just set datum1 to "void" representation (to be
-		 * consistent).
+		 * consistent, and to support cheap inequality tests for NULL
+		 * abbreviated keys).
 		 */
 		stup->datum1 = original;
 	}
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index de6fc56..9af1159 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -93,13 +93,13 @@ extern void tuplesort_putdatum(Tuplesortstate *state, Datum val,
 extern void tuplesort_performsort(Tuplesortstate *state);
 
 extern bool tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
-					   TupleTableSlot *slot);
+					   TupleTableSlot *slot, Datum *abbrev);
 extern HeapTuple tuplesort_getheaptuple(Tuplesortstate *state, bool forward,
 					   bool *should_free);
 extern IndexTuple tuplesort_getindextuple(Tuplesortstate *state, bool forward,
 						bool *should_free);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward,
-				   Datum *val, bool *isNull);
+				   Datum *val, bool *isNull, Datum *abbrev);
 
 extern bool tuplesort_skiptuples(Tuplesortstate *state, int64 ntuples,
 					 bool forward);
-- 
1.9.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Reply via email to