Greetings,

  The array_accum example aggregate in the user documentation works
  reasonably on small data sets but doesn't work too hot on large ones.
  http://www.postgresql.org/docs/8.1/static/xaggr.html

  Normally I wouldn't care particularly much but it turns out that PL/R
  uses arrays for quite a bit (eg: histograms and other statistics
  functions).  I've also heard other complaints about the performance of
  arrays, though I'm not sure if those were due to array_accum or
  something else.

  Long story short, I set out to build a faster array_accum.  Much to my
  suprise and delight, we already *had* one.  accumArrayResult() and
  makeArrayResult()/construct_md_array() appear to do a fantastic job.
  I've created a couple of 'glue' functions to expose these functions so
  they can be used in an aggregate.  I'm sure they could be improved
  upon and possibly made even smaller than they already are (90 lines
  total for both) but I'd like to throw out the idea of including them
  in core.  The aggregate created with them could also be considered for
  inclusion though I'm less concerned with that.  I don't expect general
  PostgreSQL users would have trouble creating the aggregate- I don't
  know that the average user would be able or willing to write the C
  functions.

  For comparison, the new functions run with:
  time psql -c "select aaccum(generate_series) from 
generate_series(1,1000000);" > /dev/null
      4.24s real     0.34s user     0.06s system

  Compared to:
  time psql -c "select array_accum(generate_series) from 
generate_series(1,1000000);" > /dev/null
  ...

  Well, it's still running and it's been over an hour.

  The main differences, as I see it, are: accumArrayResult() works in
  chunks of 64 elements, and uses repalloc().  array_accum uses
  array_set() which works on individual elements and uses
  palloc()/memcpy().  I appriciate that this is done because for most
  cases of array_set() it's not acceptable to modify the input and am
  not suggesting that be changed.  An alternative might be to modify
  array_set() to check if it is in an aggregate and change its behavior
  but adding the seperate functions seemed cleaner and much less
  intrusive to me.

  Please find the functions attached.

                Thanks,

                        Stephen
#include "postgres.h"
#include "fmgr.h"
#include "utils/array.h"
#include "nodes/execnodes.h"

PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(aaccum_sfunc);

Datum
aaccum_sfunc(PG_FUNCTION_ARGS)
{
	int32			totlen;
	bytea			*storage;
	Datum	 		element;
	ArrayBuildState	*astate = NULL;
	AggState		*aggstate;

	/* Make sure we are in an aggregate. */
	if (!fcinfo->context || !IsA(fcinfo->context, AggState))
		ereport(ERROR,
				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
				 errmsg("Can not call aaccum_sfunc as a non-aggregate")));

	aggstate = (AggState*) fcinfo->context;

	/* Initial call just passes in NULLs, so just allocate memory
	 * and get set up. */
	if (PG_ARGISNULL(0)) {
		storage = (bytea*) palloc(VARHDRSZ+sizeof(ArrayBuildState*));
		storage->vl_len = VARHDRSZ+sizeof(ArrayBuildState*);
		astate = NULL;
		memcpy(storage->vl_dat,&astate,sizeof(astate));
	} else {
		storage = PG_GETARG_BYTEA_P(0);
	}

	memcpy(&astate,storage->vl_dat,sizeof(astate));

	element = PG_GETARG_DATUM(1);

	astate = accumArrayResult(astate, element, PG_ARGISNULL(1), 
					          get_fn_expr_argtype(fcinfo->flinfo, 1),
							  aggstate->aggcontext);

	memcpy(storage->vl_dat,&astate,sizeof(astate));

	PG_RETURN_BYTEA_P(storage);
}

PG_FUNCTION_INFO_V1(aaccum_ffunc);

Datum
aaccum_ffunc(PG_FUNCTION_ARGS)
{
	int			dims[1];
	int			lbs[1];
	bytea		*storage;
	ArrayBuildState	*astate;
	AggState		*aggstate;
	ArrayType	*result;

	/* Make sure we are in an aggregate. */
	if (!fcinfo->context || !IsA(fcinfo->context, AggState))
		ereport(ERROR,
				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
				 errmsg("Can not call aaccum_sfunc as a non-aggregate")));

	aggstate = (AggState*) fcinfo->context;

	if (PG_ARGISNULL(0)) PG_RETURN_ARRAYTYPE_P(NULL);

	storage = (bytea*) PG_GETARG_BYTEA_P(0);
	memcpy(&astate,storage->vl_dat,sizeof(astate));

	dims[0] = astate->nelems;
	lbs[0] = 1;

	result = construct_md_array(astate->dvalues,
								astate->dnulls,
								1,
								dims,
								lbs,
								astate->element_type,
								astate->typlen,
								astate->typbyval,
								astate->typalign);

	PG_RETURN_ARRAYTYPE_P(PointerGetDatum(result));
}
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to