I noticed that we have a bottleneck in aggregate performance in advance_transition_function(): according to callgrind, doing datumCopy() and pfree() for every tuple produced by the transition function is pretty expensive. Some queries bare this out:
dvl=# SELECT W.element_id, count(W.element_ID) FROM watch_list_element W GROUP by W.element_id ORDER by count(W.element_ID) LIMIT 5; element_id | count ------------+------- 65525 | 1 163816 | 1 16341 | 1 131023 | 1 65469 | 1 (5 rows) Time: 176.723 ms dvl=# select count(*) from watch_list_element; count -------- 138044 (1 row) Time: 94.246 ms I've attached a quick and dirty hack that avoids the need to palloc() and pfree() for every tuple produced by the aggregate's transition function. This results in: dvl=# SELECT W.element_id, count(W.element_ID) FROM watch_list_element W GROUP by W.element_id ORDER by count(W.element_ID) LIMIT 5; element_id | count ------------+------- 65525 | 1 163816 | 1 16341 | 1 131023 | 1 65469 | 1 (5 rows) Time: 154.378 ms dvl=# select count(*) from watch_list_element; count -------- 138044 (1 row) Time: 73.975 ms I can reproduce this performance difference consistently. I thought this might have been attributable to memory checking overhead because assertions were enabled, but that doesn't appear to be the case (the above results are without --enable-cassert). The attached patch invokes the transition function in the current memory context. I don't think that's right: a memory leak in an aggregate's transition function would be problematic when we're invoked from a per-query memory context, as is the case with advance_aggregates(). Perhaps we need an additional short-lived memory context in AggStatePerAggData: we could invoke the transition function in that context, and reset it once per, say, 1000 tuples. Alternatively we could just mandate that aggregate transition function's not leak memory and then invoke the transition function in, say, the aggregate's memory context, but that seems a little fragile. Comments? -Neil
# # patch "src/backend/executor/nodeAgg.c" # from [851fd2d59a89ee2e3c23a1ca8fdbf4d466f98d26] # to [532ce100a0de1288fb23acc8de3a6bcac0982ec3] # --- src/backend/executor/nodeAgg.c +++ src/backend/executor/nodeAgg.c @@ -350,10 +350,10 @@ return; } } - +#if 0 /* We run the transition functions in per-input-tuple memory context */ oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory); - +#endif /* * OK to call the transition function * @@ -375,6 +375,7 @@ newVal = FunctionCallInvoke(&fcinfo); +#if 0 /* * If pass-by-ref datatype, must copy the new value into aggcontext * and pfree the prior transValue. But if transfn returned a pointer @@ -393,11 +394,13 @@ if (!pergroupstate->transValueIsNull) pfree(DatumGetPointer(pergroupstate->transValue)); } +#endif pergroupstate->transValue = newVal; pergroupstate->transValueIsNull = fcinfo.isnull; - +#if 0 MemoryContextSwitchTo(oldContext); +#endif } /*
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster