Hello Hackers,

PostgreSQL has aggregates for summing a set of values (sum()) but not for
multiplying them.  Computing the product of a column is a fairly common
request -- e.g. compounding growth/return factors, combining independent
probabilities, computing factorial-like or geometric quantities -- and today
it requires either a custom aggregate or the exp(sum(ln(...))) trick, which
does not work for zero or negative inputs and loses precision.

This idea was proposed by Peter Eisentraut, and the attached patch
implements
a built-in PRODUCT() aggregate.

*What it does*
-----

PRODUCT() returns the product of all non-null input values.  It is defined
for
int2, int4, int8, float4, float8 and numeric input, and always returns
numeric.

A few examples:

    CREATE TABLE t (g int, v int);
    INSERT INTO t VALUES (1,2),(1,3),(1,4),(2,5),(2,-6),(2,0);

    SELECT product(v) FROM t;
     product
    ---------
           0

    SELECT g, product(v) FROM t WHERE v <> 0 GROUP BY g ORDER BY g;
     g | product
    ---+---------
     1 |      24
     2 |     -30

Like sum(), PRODUCT() ignores NULL inputs and returns NULL for an empty
input
set (or a group consisting only of NULLs).

*Design / implementation notes*
-----

* The result type and the internal transition state are both numeric,
  regardless of the input type.  Using numeric for the running product
avoids
  overflow in the intermediate state for the integer and floating-point
  variants, where a product grows much faster than a sum.  (A sufficiently
  large product can of course still overflow numeric and raise an error.)

* For numeric input, the transition and combine functions are simply the
  existing numeric_mul().  For the other input types, small non-strict
  transition functions (int2_product_accum, int4_product_accum,
  int8_product_accum, float4_product_accum, float8_product_accum) promote
the
  input to numeric and then call numeric_mul().

* PRODUCT() supports Partial Mode (parallel aggregation), using
numeric_mul()
  as the combine function.  Because the transition type is numeric rather
than
  internal, no serialization/deserialization functions are needed.

* No inverse transition (moving-aggregate) function is provided.  An inverse
  for a product would require division, which is unreliable or undefined
when
  any input is zero (and lossy in general), so as a window aggregate over a
  moving frame PRODUCT() falls back to recomputing the frame.

*Open questions*
-----

* Naming.  I went with PRODUCT(); other systems and discussions have used
  names like PROD or MUL.  Happy to change it if there is a consensus.

* Return type.  Always returning numeric is the safe choice for overflow,
but
  it does mean product(double precision) returns numeric rather than a float
.
  An alternative would be to return float8 for the floating-point inputs.  I
  leaned towards numeric for consistency and to avoid overflow surprises;
  feedback welcome.

* Type coverage.  The patch covers the standard numeric input types.  money
  and interval were intentionally left out, since a product of those types
has
  no clear meaning.

*Testing / docs*
-----

The patch adds regression tests for all input types and the relevant edge
cases
(NULLs, DISTINCT, FILTER, zero/negative inputs, Infinity/NaN, overflow,
parallel
aggregation, and window usage), along with documentation updates.

Thoughts and review feedback are very welcome.

Thanks

-- 
*Jeevan Chalke*
*Senior Principal Engineer, Engineering Manager*
*Product Development*

enterprisedb.com <https://www.enterprisedb.com>

Attachment: v1-0001-Add-PRODUCT-aggregate-function.patch
Description: Binary data

Reply via email to