Re: [HACKERS] Improve hash-agg performance

Heikki Linnakangas Fri, 04 Nov 2016 06:28:03 -0700

On 11/03/2016 01:07 PM, Andres Freund wrote:

Hi,


There's two things I found while working on faster expression
evaluation, slot deforming and batched execution. As those two issues
often seemed quite dominant cost-wise it seemed worthwhile to evaluate
them independently.

1) We atm do one ExecProject() to compute each aggregate's
   arguments. Turns out it's noticeably faster to compute the argument
   for all aggregates in one go. Both because it reduces the amount of
   function call / moves more things into a relatively tight loop, and
   because it allows to deform all the required columns at once, rather
   than one-by-one.  For a single aggregate it'd be faster to avoid
   ExecProject alltogether (i.e. directly evaluate the expression as we
   used to), but as soon you have two the new approach is faster.

Makes sense. If we do your refactoring of ExecEvalExpr into anintermediate opcode representation, I assume the performance differencewill go away anyway.

2) For hash-aggs we right now we store the representative tuple using
   the input tuple's format, with unneeded columns set to NULL. That
   turns out to be expensive if the aggregated-on columns are not
   leading columns, because we have to skip over a potentially large
   number of NULLs.  The fix here is to simply use a different tuple
   format for the hashtable.  That doesn't cause overhead, because we
   already move columns in/out of the hashslot explicitly anyway.

Heh, I came to the same conclusion a couple of months ago when I wasprofiling the aggregate code. I never got around to finish up and postthe patch I wrote back then, but here you go, for comparison. It'spretty much the same as what you got here. So yeah, seems like a good idea.


- Heikki

>From 18a5098a340e7c8a18bea7e83f87b181f65d1976 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Thu, 1 Sep 2016 10:42:32 +0300
Subject: [PATCH 1/1] Don't store unused columns in hash table.

---
 src/backend/executor/nodeAgg.c | 129 ++++++++++++++++++++++++++++++++---------
 src/include/nodes/execnodes.h  |   6 +-
 2 files changed, 108 insertions(+), 27 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index ce2fc28..2521423 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1654,6 +1654,13 @@ build_hash_table(AggState *aggstate)
 	Agg		   *node = (Agg *) aggstate->ss.ps.plan;
 	MemoryContext tmpmem = aggstate->tmpcontext->ecxt_per_tuple_memory;
 	Size		entrysize;
+	int			i;
+	AttrNumber *dummyColIdx;
+
+	dummyColIdx = MemoryContextAlloc(aggstate->aggcontexts[0]->ecxt_per_tuple_memory,
+									 aggstate->numHashCols * sizeof(AttrNumber));
+	for (i = 0; i < aggstate->numHashCols; i++)
+		dummyColIdx[i] = i + 1;
 
 	Assert(node->aggstrategy == AGG_HASHED);
 	Assert(node->numGroups > 0);
@@ -1661,8 +1668,8 @@ build_hash_table(AggState *aggstate)
 	entrysize = offsetof(AggHashEntryData, pergroup) +
 		aggstate->numaggs * sizeof(AggStatePerGroupData);
 
-	aggstate->hashtable = BuildTupleHashTable(node->numCols,
-											  node->grpColIdx,
+	aggstate->hashtable = BuildTupleHashTable(aggstate->numHashCols,
+											  dummyColIdx,
 											  aggstate->phase->eqfunctions,
 											  aggstate->hashfunctions,
 											  node->numGroups,
@@ -1695,27 +1702,71 @@ build_hash_table(AggState *aggstate)
  * columns.  However, the search will be needed when we add support for
  * SQL99 semantics that allow use of "functionally dependent" columns that
  * haven't been explicitly grouped by.
+ *
+ *
+ *
+ * We build two mapping arrays. One to convert an original input tuple to
+ * the format stored in the hash table. Another to convert back.
  */
-static List *
+static void
 find_hash_columns(AggState *aggstate)
 {
 	Agg		   *node = (Agg *) aggstate->ss.ps.plan;
-	Bitmapset  *colnos;
-	List	   *collist;
+	Bitmapset  *colnos = NULL;
+	Bitmapset  *other_colnos;
 	int			i;
+	int			numHashCols;
+	AttrNumber *mapping;
+	AttrNumber	maxHashColIdx = 0;
 
 	/* Find Vars that will be needed in tlist and qual */
-	colnos = find_unaggregated_cols(aggstate);
-	/* Add in all the grouping columns */
+	other_colnos = find_unaggregated_cols(aggstate);
+
+	mapping = palloc(bms_num_members(other_colnos) + node->numCols);
+
+	numHashCols = 0;
+
+	/*
+	 * Begin by putting all the grouping columns in the front, so that they're
+	 * fast to access.
+	 */
 	for (i = 0; i < node->numCols; i++)
-		colnos = bms_add_member(colnos, node->grpColIdx[i]);
-	/* Convert to list, using lcons so largest element ends up first */
-	collist = NIL;
-	while ((i = bms_first_member(colnos)) >= 0)
-		collist = lcons_int(i, collist);
+	{
+		AttrNumber keyColIdx = node->grpColIdx[i];
+
+		mapping[numHashCols++] = keyColIdx;
+		if (keyColIdx > maxHashColIdx)
+			maxHashColIdx = keyColIdx;
+
+		/*
+		 * Note that we don't deduplicate key columns. That would complicate
+		 * the comparisons. Don't write silly queries! (Not sure if that would get
+		 * through the parser and optimizer, though). But make note of the
+		 * columns, so that if they also appear in the targetlist or quals,
+		 * we don't duplicate with those.
+		 */
+		colnos = bms_add_member(colnos, keyColIdx);
+	}
+
+	/*
+	 * Add the other columns to the mapping.
+	 */
+	while ((i = bms_first_member(other_colnos)) >= 0)
+	{
+		if (!bms_is_member(i, colnos))
+		{
+			mapping[numHashCols++] = i;
+			if (i > maxHashColIdx)
+				maxHashColIdx = i;
+			colnos = bms_add_member(colnos, i);
+		}
+	}
 	bms_free(colnos);
+	bms_free(other_colnos);
 
-	return collist;
+	aggstate->hashCols = mapping;
+	aggstate->numHashCols = numHashCols;
+	aggstate->maxHashColIdx = maxHashColIdx;
 }
 
 /*
@@ -1748,27 +1799,39 @@ static AggHashEntry
 lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
 {
 	TupleTableSlot *hashslot = aggstate->hashslot;
-	ListCell   *l;
+	int			i;
 	AggHashEntry entry;
 	bool		isnew;
 
 	/* if first time through, initialize hashslot by cloning input slot */
 	if (hashslot->tts_tupleDescriptor == NULL)
 	{
-		ExecSetSlotDescriptor(hashslot, inputslot->tts_tupleDescriptor);
-		/* Make sure all unused columns are NULLs */
-		ExecStoreAllNullTuple(hashslot);
+		TupleDesc	hashDesc;
+
+		hashDesc = CreateTemplateTupleDesc(aggstate->numHashCols, false);
+		for (i = 0; i < aggstate->numHashCols; i++)
+		{
+			int			varNumber = aggstate->hashCols[i];
+
+			TupleDescCopyEntry(hashDesc, i + 1,
+							   inputslot->tts_tupleDescriptor,
+							   varNumber);
+		}
+
+		ExecSetSlotDescriptor(hashslot, hashDesc);
 	}
 
 	/* transfer just the needed columns into hashslot */
-	slot_getsomeattrs(inputslot, linitial_int(aggstate->hash_needed));
-	foreach(l, aggstate->hash_needed)
+	slot_getsomeattrs(inputslot, aggstate->maxHashColIdx);
+	ExecClearTuple(hashslot);
+	for (i = 0; i < aggstate->numHashCols; i++)
 	{
-		int			varNumber = lfirst_int(l) - 1;
+		int			varNumber = aggstate->hashCols[i];
 
-		hashslot->tts_values[varNumber] = inputslot->tts_values[varNumber];
-		hashslot->tts_isnull[varNumber] = inputslot->tts_isnull[varNumber];
-	}
+		hashslot->tts_values[i] = inputslot->tts_values[varNumber - 1];
+		hashslot->tts_isnull[i] = inputslot->tts_isnull[varNumber - 1];
+	}		
+	ExecStoreVirtualTuple(hashslot);
 
 	/* find or create the hashtable entry using the filtered tuple */
 	entry = (AggHashEntry) LookupTupleHashEntry(aggstate->hashtable,
@@ -2228,6 +2291,8 @@ agg_retrieve_hash_table(AggState *aggstate)
 	AggHashEntry entry;
 	TupleTableSlot *firstSlot;
 	TupleTableSlot *result;
+	TupleTableSlot *hashslot = aggstate->hashslot;
+	int			i;
 
 	/*
 	 * get state info from node
@@ -2268,8 +2333,19 @@ agg_retrieve_hash_table(AggState *aggstate)
 		 * for it, so that it can be used in ExecProject.
 		 */
 		ExecStoreMinimalTuple(entry->shared.firstTuple,
-							  firstSlot,
+							  hashslot,
 							  false);
+		slot_getallattrs(hashslot);
+
+		ExecStoreAllNullTuple(firstSlot);
+		ExecClearTuple(firstSlot);
+		for (i = 0; i < aggstate->numHashCols; i++)
+		{
+			AttrNumber varNumber = aggstate->hashCols[i];
+			firstSlot->tts_values[varNumber - 1] = hashslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber - 1] = hashslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
 
 		pergroup = entry->pergroup;
 
@@ -2577,10 +2653,11 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 	if (node->aggstrategy == AGG_HASHED)
 	{
+		/* Compute the columns we actually need to hash on */
+		find_hash_columns(aggstate);
+
 		build_hash_table(aggstate);
 		aggstate->table_filled = false;
-		/* Compute the columns we actually need to hash on */
-		aggstate->hash_needed = find_hash_columns(aggstate);
 	}
 	else
 	{
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a4ea1b9..bc23c5f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1853,7 +1853,11 @@ typedef struct AggState
 	/* these fields are used in AGG_HASHED mode: */
 	TupleHashTable hashtable;	/* hash table with one entry per group */
 	TupleTableSlot *hashslot;	/* slot for loading hash table */
-	List	   *hash_needed;	/* list of columns needed in hash table */
+
+	AttrNumber *hashCols;		/* mapping from input tuple to hashed tuple */
+	int			numHashCols;
+	AttrNumber	maxHashColIdx;	/* highest column from input tuple that we need to include in the  hash table */
+
 	bool		table_filled;	/* hash table filled yet? */
 	TupleHashIterator hashiter; /* for iterating through hash table */
 } AggState;
-- 
2.9.3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve hash-agg performance

Reply via email to