On Sun, Jan 17, 2016 at 9:26 PM, David Rowley <david.row...@2ndquadrant.com> wrote: > hmm, so wouldn't that mean that the transition function would need to (for > each input tuple): > > 1. Parse that StringInfo into tokens. > 2. Create a new aggregate state object. > 3. Populate the new aggregate state based on the tokenised StringInfo, this > would perhaps require that various *_in() functions are called on each > token. > 4. Add the new tuple to the aggregate state. > 5. Build a new StringInfo based on the aggregate state modified in 4. > > ?
I don't really know what you mean by parse the StringInfo into tokens. The whole point of the expanded-object interface is to be able to keep things in an expanded internal form so that you *don't* have to repeatedly construct and deconstruct internal data structures. I worked up an example of this approach using string_agg(), which I attach here. This changes the transition type of string_agg() from internal to text. The same code would work for bytea_string_agg(), which would allow removal of some other code, but this patch doesn't do that, because the point of this is to elucidate the approach. In my tests, this seems to be slightly slower than what we're doing today; worst of all, it adds a handful of cycles to advance_transition_function() even when the aggregate is not an expanded object or, indeed, not even pass-by-reference. Some of this might be able to be fixed by a little massaging - in particular, DatumIsReadWriteExpandedObject() seems like it could be partly or entirely inlined, and maybe there's some other way to improve the coding here. Generally, I think finding a way to pass expanded objects through nodeAgg.c would be a good thing to pursue, if we can make it work. The immediate impetus for changing things this way would be that we wouldn't need to add a mechanism for serializing and deserializing internal functions just to pass around partial aggregates. But there's another advantage, too: right now, advance_transition_function() does a lot of data copying to move data from per-call context to the per-aggregate context. When an expanded object is in use, this can be skipped. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index f49114a..a2c29c4 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -756,11 +756,18 @@ advance_transition_function(AggState *aggstate, aggstate->curpertrans = NULL; /* - * If pass-by-ref datatype, must copy the new value into aggcontext and - * pfree the prior transValue. But if transfn returned a pointer to its - * first input, we don't need to do anything. + * If we got a R/W pointer to an expanded object, we can just take over + * control of the object. Any other pass-by-ref value must be copied into + * aggcontext and the prior value freed; with the exception that if transfn + * returned a pointer to its first input, we don't need to do anything. */ - if (!pertrans->transtypeByVal && + if (DatumIsReadWriteExpandedObject(newVal, fcinfo->isnull, + pertrans->transtypeLen)) + { + newVal = TransferExpandedObject(newVal, + aggstate->aggcontexts[aggstate->current_set]->ecxt_per_tuple_memory); + } + else if (!pertrans->transtypeByVal && DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue)) { if (!fcinfo->isnull) diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 2cb7bab..61c9359 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -12,7 +12,7 @@ include $(top_builddir)/src/Makefile.global OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \ array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \ bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \ - encode.o enum.o expandeddatum.o \ + encode.o enum.o expandeddatum.o expandedstring.o \ float.o format_type.o formatting.o genfile.o \ geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \ int8.o json.o jsonb.o jsonb_gin.o jsonb_op.o jsonb_util.o \ diff --git a/src/backend/utils/adt/expandedstring.c b/src/backend/utils/adt/expandedstring.c new file mode 100644 index 0000000..4e279a3 --- /dev/null +++ b/src/backend/utils/adt/expandedstring.c @@ -0,0 +1,86 @@ +/*------------------------------------------------------------------------- + * + * expandedstring.c + * Expand a varlena into a StringInfo. + * + * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/expandeddatum.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "fmgr.h" +#include "utils/expandedstring.h" +#include "utils/memutils.h" + +static Size ESI_get_flat_size(ExpandedObjectHeader *eohptr); +static void ESI_flatten_into(ExpandedObjectHeader *eohptr, + void *result, Size allocated_size); + +static const ExpandedObjectMethods ESI_methods = +{ + ESI_get_flat_size, + ESI_flatten_into +}; + +/* + * Construct an expanded datum consisting of an empty StringInfo. + * + * Caller must ensure that CurrentMemoryContext points to a context with + * a suitable lifetime. + */ +ExpandedStringInfoHeader * +GetExpandedStringInfo(void) +{ + ExpandedStringInfoHeader *esih; + MemoryContext objcxt; + MemoryContext oldcxt; + + objcxt = AllocSetContextCreate(CurrentMemoryContext, + "stringinfo expanded object", + ALLOCSET_SMALL_MINSIZE, + ALLOCSET_SMALL_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + + oldcxt = MemoryContextSwitchTo(objcxt); + esih = palloc(sizeof(ExpandedStringInfoHeader)); + EOH_init_header(&esih->hdr, &ESI_methods, objcxt); + initStringInfo(&esih->buf); + MemoryContextSwitchTo(oldcxt); + + return esih; +} + +/* + * The space required to flatten a StringInfo back to a plain old varlena is + * just the number of bytes we have in the buffer, plus the size of a 4-byte + * header. Even if the buffer is short, we can't flatten to a packed + * representation. + */ +static Size +ESI_get_flat_size(ExpandedObjectHeader *eohptr) +{ + ExpandedStringInfoHeader *esih = (ExpandedStringInfoHeader *) eohptr; + + return VARHDRSZ + esih->buf.len; +} + +/* + * Flattening a StringInfo just involves copying the data into the allocated + * space. + */ +static void +ESI_flatten_into(ExpandedObjectHeader *eohptr, + void *result, Size allocated_size) +{ + ExpandedStringInfoHeader *esih = (ExpandedStringInfoHeader *) eohptr; + + Assert(allocated_size == VARHDRSZ + esih->buf.len); + memcpy(VARDATA(result), esih->buf.data, esih->buf.len); + SET_VARSIZE(result, VARHDRSZ + esih->buf.len); +} diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 8683188..09d7c7e 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -30,6 +30,7 @@ #include "regex/regex.h" #include "utils/builtins.h" #include "utils/bytea.h" +#include "utils/expandedstring.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/pg_locale.h" @@ -4357,16 +4358,6 @@ pg_column_size(PG_FUNCTION_ARGS) PG_RETURN_INT32(result); } -/* - * string_agg - Concatenates values and returns string. - * - * Syntax: string_agg(value text, delimiter text) RETURNS text - * - * Note: Any NULL values are ignored. The first-call delimiter isn't - * actually used at all, and on subsequent calls the delimiter precedes - * the associated value. - */ - /* subroutine to initialize state */ static StringInfo makeStringAggState(FunctionCallInfo fcinfo) @@ -4392,46 +4383,54 @@ makeStringAggState(FunctionCallInfo fcinfo) return state; } +/* + * string_agg - Concatenates values and returns string. + * + * Syntax: string_agg(value text, delimiter text) RETURNS text + * + * Note: Any NULL values are ignored. The first-call delimiter isn't + * actually used at all, and on subsequent calls the delimiter precedes + * the associated value. + */ Datum string_agg_transfn(PG_FUNCTION_ARGS) { - StringInfo state; + ExpandedStringInfoHeader *state; - state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0); - - /* Append the value unless null. */ - if (!PG_ARGISNULL(1)) + /* If the second argument is NULL, just return the first argument. */ + if (PG_ARGISNULL(1)) { - /* On the first time through, we ignore the delimiter. */ - if (state == NULL) - state = makeStringAggState(fcinfo); - else if (!PG_ARGISNULL(2)) - appendStringInfoText(state, PG_GETARG_TEXT_PP(2)); /* delimiter */ - - appendStringInfoText(state, PG_GETARG_TEXT_PP(1)); /* value */ + if (PG_ARGISNULL(0)) + PG_RETURN_NULL(); + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); } /* - * The transition type for string_agg() is declared to be "internal", - * which is a pass-by-value type the same size as a pointer. + * Expand the first argument if needed; then, add any delimeter (unless + * the first argument is NULL). */ - PG_RETURN_POINTER(state); -} - -Datum -string_agg_finalfn(PG_FUNCTION_ARGS) -{ - StringInfo state; - - /* cannot be called directly because of internal-type argument */ - Assert(AggCheckCallContext(fcinfo, NULL)); + if (!PG_ARGISNULL(0) && VARATT_IS_EXTERNAL_EXPANDED_RW(PG_GETARG_DATUM(0))) + { + state = (ExpandedStringInfoHeader *) DatumGetEOHP(PG_GETARG_DATUM(0)); + if (!PG_ARGISNULL(2)) + appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(2)); + } + else + { + state = GetExpandedStringInfo(); + /* Note that if the transition state is NULL, we skip the delimiter. */ + if (!PG_ARGISNULL(0)) + { + appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(0)); + if (!PG_ARGISNULL(2)) + appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(2)); + } + } - state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0); + /* Append second argument to first. */ + appendStringInfoText(&state->buf, PG_GETARG_TEXT_PP(1)); - if (state != NULL) - PG_RETURN_TEXT_P(cstring_to_text_with_len(state->data, state->len)); - else - PG_RETURN_NULL(); + PG_RETURN_DATUM(EOHPGetRWDatum(&state->hdr)); } /* diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h index 28b0669..d8f9649 100644 --- a/src/include/catalog/pg_aggregate.h +++ b/src/include/catalog/pg_aggregate.h @@ -279,7 +279,7 @@ DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn - - - t f 0 DATA(insert ( 4053 n 0 array_agg_array_transfn array_agg_array_finalfn - - - t f 0 2281 0 0 0 _null_ _null_ )); /* text */ -DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ )); +DATA(insert ( 3538 n 0 string_agg_transfn - - - - f f 0 25 0 0 0 _null_ _null_ )); /* bytea */ DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn - - - f f 0 2281 0 0 0 _null_ _null_ )); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index f58672e..6def62d 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -2612,10 +2612,8 @@ DESCR("aggregate final function"); DATA(insert OID = 2817 ( float8_corr PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 701 "1022" _null_ _null_ _null_ _null_ _null_ float8_corr _null_ _null_ _null_ )); DESCR("aggregate final function"); -DATA(insert OID = 3535 ( string_agg_transfn PGNSP PGUID 12 1 0 0 0 f f f f f f i s 3 0 2281 "2281 25 25" _null_ _null_ _null_ _null_ _null_ string_agg_transfn _null_ _null_ _null_ )); +DATA(insert OID = 3535 ( string_agg_transfn PGNSP PGUID 12 1 0 0 0 f f f f f f i s 3 0 25 "25 25 25" _null_ _null_ _null_ _null_ _null_ string_agg_transfn _null_ _null_ _null_ )); DESCR("aggregate transition function"); -DATA(insert OID = 3536 ( string_agg_finalfn PGNSP PGUID 12 1 0 0 0 f f f f f f i s 1 0 25 "2281" _null_ _null_ _null_ _null_ _null_ string_agg_finalfn _null_ _null_ _null_ )); -DESCR("aggregate final function"); DATA(insert OID = 3538 ( string_agg PGNSP PGUID 12 1 0 0 0 t f f f f f i s 2 0 25 "25 25" _null_ _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ )); DESCR("concatenate aggregate input into a string"); DATA(insert OID = 3543 ( bytea_string_agg_transfn PGNSP PGUID 12 1 0 0 0 f f f f f f i s 3 0 2281 "2281 17 17" _null_ _null_ _null_ _null_ _null_ bytea_string_agg_transfn _null_ _null_ _null_ )); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index b35d206..4af77b9 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -841,7 +841,6 @@ extern Datum pg_column_size(PG_FUNCTION_ARGS); extern Datum bytea_string_agg_transfn(PG_FUNCTION_ARGS); extern Datum bytea_string_agg_finalfn(PG_FUNCTION_ARGS); extern Datum string_agg_transfn(PG_FUNCTION_ARGS); -extern Datum string_agg_finalfn(PG_FUNCTION_ARGS); extern Datum text_concat(PG_FUNCTION_ARGS); extern Datum text_concat_ws(PG_FUNCTION_ARGS); diff --git a/src/include/utils/expandedstring.h b/src/include/utils/expandedstring.h new file mode 100644 index 0000000..bb03d89 --- /dev/null +++ b/src/include/utils/expandedstring.h @@ -0,0 +1,31 @@ +/*------------------------------------------------------------------------- + * + * expandedstring.h + * Expand a varlena into a StringInfo for faster in-memory manipulation. + * + * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/utils/expandedstring.h + * + *------------------------------------------------------------------------- + */ + +#ifndef EXPANDEDSTRING_H +#define EXPANDEDSTRING_H + +#include "lib/stringinfo.h" +#include "utils/expandeddatum.h" + +typedef struct ExpandedStringInfoHeader +{ + /* Standard header for expanded objects */ + ExpandedObjectHeader hdr; + + /* String info */ + StringInfoData buf; +} ExpandedStringInfoHeader; + +extern ExpandedStringInfoHeader *GetExpandedStringInfo(void); + +#endif
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers