Re: [HACKERS] Combining Aggregates

Robert Haas Sun, 17 Jan 2016 19:45:49 -0800

On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
<[email protected]> wrote:
> hmm, so wouldn't that mean that the transition function would need to (for
> each input tuple):
>
> 1. Parse that StringInfo into tokens.
> 2. Create a new aggregate state object.
> 3. Populate the new aggregate state based on the tokenised StringInfo, this
> would perhaps require that various *_in() functions are called on each
> token.
> 4. Add the new tuple to the aggregate state.
> 5. Build a new StringInfo based on the aggregate state modified in 4.
>
> ?


I don't really know what you mean by parse the StringInfo into tokens.
The whole point of the expanded-object interface is to be able to keep
things in an expanded internal form so that you *don't* have to
repeatedly construct and deconstruct internal data structures.  I
worked up an example of this approach using string_agg(), which I
attach here.  This changes the transition type of string_agg() from
internal to text.  The same code would work for bytea_string_agg(),
which would allow removal of some other code, but this patch doesn't
do that, because the point of this is to elucidate the approach.

In my tests, this seems to be slightly slower than what we're doing
today; worst of all, it adds a handful of cycles to
advance_transition_function() even when the aggregate is not an
expanded object or, indeed, not even pass-by-reference.  Some of this
might be able to be fixed by a little massaging - in particular,
DatumIsReadWriteExpandedObject() seems like it could be partly or
entirely inlined, and maybe there's some other way to improve the
coding here.

Generally, I think finding a way to pass expanded objects through
nodeAgg.c would be a good thing to pursue, if we can make it work.
The immediate impetus for changing things this way would be that we
wouldn't need to add a mechanism for serializing and deserializing
internal functions just to pass around partial aggregates.  But
there's another advantage, too: right now,
advance_transition_function() does a lot of data copying to move data
from per-call context to the per-aggregate context.  When an expanded
object is in use, this can be skipped.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f49114a..a2c29c4 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -756,11 +756,18 @@ advance_transition_function(AggState *aggstate,
 	aggstate->curpertrans = NULL;
 
 	/*
-	 * If pass-by-ref datatype, must copy the new value into aggcontext and
-	 * pfree the prior transValue.  But if transfn returned a pointer to its
-	 * first input, we don't need to do anything.
+	 * If we got a R/W pointer to an expanded object, we can just take over
+	 * control of the object.  Any other pass-by-ref value must be copied into
+	 * aggcontext and the prior value freed; with the exception that if transfn
+	 * returned a pointer to its first input, we don't need to do anything.
 	 */
-	if (!pertrans->transtypeByVal &&
+	if (DatumIsReadWriteExpandedObject(newVal, fcinfo->isnull,
+									   pertrans->transtypeLen))
+	{
+		newVal = TransferExpandedObject(newVal,
+		  aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory);
+	}
+	else if (!pertrans->transtypeByVal &&
 		DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
 	{
 		if (!fcinfo->isnull)
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 2cb7bab..61c9359 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -12,7 +12,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
 	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
-	encode.o enum.o expandeddatum.o \
+	encode.o enum.o expandeddatum.o expandedstring.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
 	int8.o json.o jsonb.o jsonb_gin.o jsonb_op.o jsonb_util.o \
diff --git a/src/backend/utils/adt/expandedstring.c b/src/backend/utils/adt/expandedstring.c
new file mode 100644
index 0000000..4e279a3
--- /dev/null
+++ b/src/backend/utils/adt/expandedstring.c
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * expandedstring.c
+ *	  Expand a varlena into a StringInfo.
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/expandeddatum.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "utils/expandedstring.h"
+#include "utils/memutils.h"
+
+static Size ESI_get_flat_size(ExpandedObjectHeader *eohptr);
+static void ESI_flatten_into(ExpandedObjectHeader *eohptr,
+				 void *result, Size allocated_size);
+
+static const ExpandedObjectMethods ESI_methods =
+{
+	ESI_get_flat_size,
+	ESI_flatten_into
+};
+
+/*
+ * Construct an expanded datum consisting of an empty StringInfo.
+ *
+ * Caller must ensure that CurrentMemoryContext points to a context with
+ * a suitable lifetime.
+ */
+ExpandedStringInfoHeader *
+GetExpandedStringInfo(void)
+{
+	ExpandedStringInfoHeader *esih;
+	MemoryContext	objcxt;
+	MemoryContext	oldcxt;
+
+	objcxt = AllocSetContextCreate(CurrentMemoryContext,
+								   "stringinfo expanded object",
+								   ALLOCSET_SMALL_MINSIZE,
+								   ALLOCSET_SMALL_INITSIZE,
+								   ALLOCSET_DEFAULT_MAXSIZE);
+
+	oldcxt = MemoryContextSwitchTo(objcxt);	
+	esih = palloc(sizeof(ExpandedStringInfoHeader));
+	EOH_init_header(&esih->hdr, &ESI_methods, objcxt);
+	initStringInfo(&esih->buf);
+	MemoryContextSwitchTo(oldcxt);
+
+	return esih;
+}
+
+/*
+ * The space required to flatten a StringInfo back to a plain old varlena is
+ * just the number of bytes we have in the buffer, plus the size of a 4-byte
+ * header.  Even if the buffer is short, we can't flatten to a packed
+ * representation.
+ */
+static Size
+ESI_get_flat_size(ExpandedObjectHeader *eohptr)
+{
+	ExpandedStringInfoHeader *esih = (ExpandedStringInfoHeader *) eohptr;
+
+	return VARHDRSZ + esih->buf.len;
+}
+
+/*
+ * Flattening a StringInfo just involves copying the data into the allocated
+ * space.
+ */
+static void
+ESI_flatten_into(ExpandedObjectHeader *eohptr,
+				 void *result, Size allocated_size)
+{
+	ExpandedStringInfoHeader *esih = (ExpandedStringInfoHeader *) eohptr;
+
+	Assert(allocated_size == VARHDRSZ + esih->buf.len);
+	memcpy(VARDATA(result), esih->buf.data, esih->buf.len);
+	SET_VARSIZE(result, VARHDRSZ + esih->buf.len);
+}
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 8683188..09d7c7e 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -30,6 +30,7 @@
 #include "regex/regex.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/expandedstring.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
@@ -4357,16 +4358,6 @@ pg_column_size(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(result);
 }
 
-/*
- * string_agg - Concatenates values and returns string.
- *
- * Syntax: string_agg(value text, delimiter text) RETURNS text
- *
- * Note: Any NULL values are ignored. The first-call delimiter isn't
- * actually used at all, and on subsequent calls the delimiter precedes
- * the associated value.
- */
-
 /* subroutine to initialize state */
 static StringInfo
 makeStringAggState(FunctionCallInfo fcinfo)
@@ -4392,46 +4383,54 @@ makeStringAggState(FunctionCallInfo fcinfo)
 	return state;
 }
 
+/*
+ * string_agg - Concatenates values and returns string.
+ *
+ * Syntax: string_agg(value text, delimiter text) RETURNS text
+ *
+ * Note: Any NULL values are ignored. The first-call delimiter isn't
+ * actually used at all, and on subsequent calls the delimiter precedes
+ * the associated value.
+ */
 Datum
 string_agg_transfn(PG_FUNCTION_ARGS)
 {
-	StringInfo	state;
+	ExpandedStringInfoHeader *state;
 
-	state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0);
-
-	/* Append the value unless null. */
-	if (!PG_ARGISNULL(1))
+	/* If the second argument is NULL, just return the first argument. */
+	if (PG_ARGISNULL(1))
 	{
-		/* On the first time through, we ignore the delimiter. */
-		if (state == NULL)
-			state = makeStringAggState(fcinfo);
-		else if (!PG_ARGISNULL(2))
-			appendStringInfoText(state, PG_GETARG_TEXT_PP(2));	/* delimiter */
-
-		appendStringInfoText(state, PG_GETARG_TEXT_PP(1));		/* value */
+		if (PG_ARGISNULL(0))
+			PG_RETURN_NULL();
+		PG_RETURN_DATUM(PG_GETARG_DATUM(0));
 	}
 
 	/*
-	 * The transition type for string_agg() is declared to be "internal",
-	 * which is a pass-by-value type the same size as a pointer.
+	 * Expand the first argument if needed; then, add any delimeter (unless
+	 * the first argument is NULL).
 	 */
-	PG_RETURN_POINTER(state);
-}
-
-Datum
-string_agg_finalfn(PG_FUNCTION_ARGS)
-{
-	StringInfo	state;
-
-	/* cannot be called directly because of internal-type argument */
-	Assert(AggCheckCallContext(fcinfo, NULL));
+	if (!PG_ARGISNULL(0) && VARATT_IS_EXTERNAL_EXPANDED_RW(PG_GETARG_DATUM(0)))
+	{
+        state = (ExpandedStringInfoHeader *) DatumGetEOHP(PG_GETARG_DATUM(0));
+		if (!PG_ARGISNULL(2))
+			appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(2));
+	}
+	else
+	{
+		state = GetExpandedStringInfo();
+		/* Note that if the transition state is NULL, we skip the delimiter. */
+		if (!PG_ARGISNULL(0))
+		{
+			appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(0));
+			if (!PG_ARGISNULL(2))
+				appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(2));
+		}
+	}
 
-	state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0);
+	/* Append second argument to first. */
+	appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(1));
 
-	if (state != NULL)
-		PG_RETURN_TEXT_P(cstring_to_text_with_len(state->data, state->len));
-	else
-		PG_RETURN_NULL();
+	PG_RETURN_DATUM(EOHPGetRWDatum(&state->hdr));
 }
 
 /*
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 28b0669..d8f9649 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -279,7 +279,7 @@ DATA(insert ( 2335	n 0 array_agg_transfn	array_agg_finalfn	-				-				-				t f 0
 DATA(insert ( 4053	n 0 array_agg_array_transfn array_agg_array_finalfn -		-				-				t f 0	2281	0	0		0	_null_ _null_ ));
 
 /* text */
-DATA(insert ( 3538	n 0 string_agg_transfn	string_agg_finalfn	-				-				-				f f 0	2281	0	0		0	_null_ _null_ ));
+DATA(insert ( 3538	n 0 string_agg_transfn	-		-				-				-				f f 0	25 0	0		0	_null_ _null_ ));
 
 /* bytea */
 DATA(insert ( 3545	n 0 bytea_string_agg_transfn	bytea_string_agg_finalfn	-				-				-		f f 0	2281	0	0		0	_null_ _null_ ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f58672e..6def62d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2612,10 +2612,8 @@ DESCR("aggregate final function");
 DATA(insert OID = 2817 (  float8_corr				PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_corr _null_ _null_ _null_ ));
 DESCR("aggregate final function");
 
-DATA(insert OID = 3535 (  string_agg_transfn		PGNSP PGUID 12 1 0 0 0 f f f f f f i s 3 0 2281 "2281 25 25" _null_ _null_ _null_ _null_ _null_ string_agg_transfn _null_ _null_ _null_ ));
+DATA(insert OID = 3535 (  string_agg_transfn		PGNSP PGUID 12 1 0 0 0 f f f f f f i s 3 0 25 "25 25 25" _null_ _null_ _null_ _null_ _null_ string_agg_transfn _null_ _null_ _null_ ));
 DESCR("aggregate transition function");
-DATA(insert OID = 3536 (  string_agg_finalfn		PGNSP PGUID 12 1 0 0 0 f f f f f f i s 1 0 25 "2281" _null_ _null_ _null_ _null_ _null_ string_agg_finalfn _null_ _null_ _null_ ));
-DESCR("aggregate final function");
 DATA(insert OID = 3538 (  string_agg				PGNSP PGUID 12 1 0 0 0 t f f f f f i s 2 0 25 "25 25" _null_ _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
 DESCR("concatenate aggregate input into a string");
 DATA(insert OID = 3543 (  bytea_string_agg_transfn	PGNSP PGUID 12 1 0 0 0 f f f f f f i s 3 0 2281 "2281 17 17" _null_ _null_ _null_ _null_ _null_ bytea_string_agg_transfn _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index b35d206..4af77b9 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -841,7 +841,6 @@ extern Datum pg_column_size(PG_FUNCTION_ARGS);
 extern Datum bytea_string_agg_transfn(PG_FUNCTION_ARGS);
 extern Datum bytea_string_agg_finalfn(PG_FUNCTION_ARGS);
 extern Datum string_agg_transfn(PG_FUNCTION_ARGS);
-extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
 
 extern Datum text_concat(PG_FUNCTION_ARGS);
 extern Datum text_concat_ws(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/expandedstring.h b/src/include/utils/expandedstring.h
new file mode 100644
index 0000000..bb03d89
--- /dev/null
+++ b/src/include/utils/expandedstring.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * expandedstring.h
+ *	  Expand a varlena into a StringInfo for faster in-memory manipulation.
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/expandedstring.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXPANDEDSTRING_H
+#define EXPANDEDSTRING_H
+
+#include "lib/stringinfo.h"
+#include "utils/expandeddatum.h"
+
+typedef struct ExpandedStringInfoHeader
+{
+	/* Standard header for expanded objects */
+	ExpandedObjectHeader hdr;
+
+	/* String info */
+	StringInfoData		buf;
+} ExpandedStringInfoHeader;
+
+extern ExpandedStringInfoHeader *GetExpandedStringInfo(void);
+
+#endif

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

Reply via email to