I am trying to see if I can write a C aggregate function that operates on
numeric[] array columns, and essentially performs aggregates on the array
elements as if they were individual columns, resulting in an output numeric[]
array with the aggregate element values. I do realise I can use unnest() and
then the built-in aggregates like this:
SELECT array_agg(v ORDER BY i) FROM (
SELECT i, sum(v) AS v
FROM (VALUES
(ARRAY[1,2,3]::numeric[]),
(ARRAY[2,3,4]::numeric[])
) n(nums), unnest(n.nums) WITH ORDINALITY AS p(v,i)
GROUP BY i
) g;
array_agg
-----------
{3,5,7}
What I am doing is based on the aggs_for_vecs [1] extension so my goal is to
have SQL like this:
SELECT vec_to_sum(nums) FROM (VALUES
(ARRAY[1,2,3]::numeric[]),
(ARRAY[2,3,4]::numeric[])
) n(nums);
I have the extension working with numerics by doing numeric calculations with
the help of the functions defined in numeric.h (numeric_add_opt_error() in this
case) but for other aggregates like average or var_samp I was hoping to piggy
back off all the support in numeric.c. The primary motivation for doing this as
a C extension is because it has proved to execute much faster than the
equivalent unnest() SQL.
So far, I have been working on average support via the vec_to_mean() aggregate,
and my aggregate's [2] transition function sets up a FunctionCallInfo for the
numeric_avg_accum() [3] function and then loops over the input array elements,
calling numeric_avg_accum() and saving its result state object in my
aggregate’s state. Before looping, I switch the memory context to the
aggregate’s context, i.e. there is stuff like
MemoryContext aggContext;
AggCheckCallContext(fcinfo, &aggContext);
old = MemoryContextSwitchTo(aggContext);
for (i = 0; i < arrayLength; i++) {
// invoke numeric_avg_accum() for each array element, store result in my state
}
MemoryContextSwitchTo(old);
In my aggregate function's final function I set up a FunctionCallInfo for the
numeric_avg() function and loop over all the numeric_avg_accum() state objects
[4], saving their results as the final array returned from the vec_to_mean()
aggregate.
Overall, the approach seems like it should work, however I don’t believe I’m
handling the memory contexts correctly. The function works sometimes, but
crashes or returns incorrect results sometimes. I tried to debug what’s going
on, but I am very new to working on a C extension for Postgres so all I surmise
so far is that some of my saved state objects appear to be released before they
should be. That finally brings me to my questions:
1. Is it even reasonable for me to try to do this approach?
2. Is there anything I should be doing differently with memory contexts, like
creating a new one(s) for the calls to numeric_avg_accum()?
3. Is there something else I’m doing wrong?
Thank you very much for your time. I hope I was clear, as I mentioned this is
all quite new to me so my assumptions/approach/terminology might be off.
Cheers,
Matt Magoffin
[1] https://github.com/pjungwir/aggs_for_vecs
[2]
https://github.com/SolarNetwork/aggs_for_vecs/blob/feature/numeric-stats-agg/vec_to_mean_numeric.c
[3]
https://github.com/SolarNetwork/aggs_for_vecs/blob/7c2a5aad35a814dca6d9f5a35df1de6e4b98d149/vec_to_mean_numeric.c#L57-L58
[4]
https://github.com/SolarNetwork/aggs_for_vecs/blob/7c2a5aad35a814dca6d9f5a35df1de6e4b98d149/vec_to_mean_numeric.c#L117-L126