On Mon, 20 Feb 2023 at 08:03, Joel Jacobson <j...@compiler.org> wrote:
>
> On Mon, Feb 20, 2023, at 08:38, Michael Paquier wrote:
> > Perhaps you should register this patch to the commit of March?  Here
> > it is:
> > https://commitfest.postgresql.org/42/
>
> Thanks, done.
>

I have been testing this a bit, and I get less impressive results than
the ones reported so far.

Testing Andres' example:

SELECT max(a + b + '17'::numeric + c)
  FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a),
       (SELECT generate_series(1::numeric, 100::numeric)) bb(b),
       (SELECT generate_series(1::numeric, 10::numeric)) cc(c);

with HEAD, I get:

Time: 216.978 ms
Time: 215.376 ms
Time: 214.973 ms
Time: 216.288 ms
Time: 216.494 ms

and removing the free_var() from numeric_add_opt_error() I get:

Time: 212.706 ms
Time: 212.684 ms
Time: 211.378 ms
Time: 213.383 ms
Time: 213.050 ms

That's 1-2% faster, not the 6-7% that Andres saw.

Testing the same example with the latest 0003-fixed-buf.patch, I get:

Time: 224.115 ms
Time: 225.382 ms
Time: 225.691 ms
Time: 224.135 ms
Time: 225.412 ms

which is now 4% slower.

I think the problem is that if you increase the size of NumericVar,
you increase the stack space required, as well as adding some overhead
to alloc_var(). Also, primitive operations like add_var() directly
call digitbuf_alloc(), so as it stands, they don't benefit from the
static buffer. Also, I'm not convinced that a 4-digit static buffer
would really be of much benefit to many numeric computations anyway.

To try to test the real-world benefit to numeric_in(), I re-ran one of
the tests I used while testing the non-decimal integer patches, using
COPY to read a large number of random numerics from a file:

CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric,
                      c5 numeric, c6 numeric, c7 numeric, c8 numeric,
                      c9 numeric, c10 numeric);

INSERT INTO foo
  SELECT trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4))
  FROM generate_series(1,10000000);
COPY foo TO '/tmp/random-numerics.csv';

\timing on
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';

With HEAD, this gave:

Time: 10750.298 ms (00:10.750)
Time: 10746.248 ms (00:10.746)
Time: 10772.277 ms (00:10.772)
Time: 10758.282 ms (00:10.758)
Time: 10760.425 ms (00:10.760)

and with 0003-fixed-buf.patch, it gave:

Time: 10623.254 ms (00:10.623)
Time: 10463.814 ms (00:10.464)
Time: 10461.700 ms (00:10.462)
Time: 10429.436 ms (00:10.429)
Time: 10438.359 ms (00:10.438)

So that's a 2-3% gain, which might be worthwhile, if not for the
slowdown in the other case.

I actually had a slightly different idea to improve numeric.c's memory
management, which gives a noticeable improvement for a few of the
simple numeric functions:

A common pattern in these numeric functions is to allocate memory for
the NumericVar's digit buffer while doing the computation, allocate
more memory for the Numeric result, copy the digits across, and then
free the NumericVar's digit buffer.

That can be reduced to just 1 palloc() and no pfree()'s by ensuring
that the initial allocation is large enough to hold the final Numeric
result, and then re-using that memory instead of allocating more. That
can be applied to all the numeric functions, saving a palloc() and a
pfree() in each case, and it fits quite well with the way
make_result() is used in all but one case (generate_series() needs to
be a little more careful to avoid trampling on the digit buffer of the
current value).

In Andres' generate_series() example, this gave:

Time: 203.838 ms
Time: 206.623 ms
Time: 204.672 ms
Time: 202.434 ms
Time: 204.893 ms

which is around 5-6% faster.

In the COPY test, it gave:

Time: 10511.293 ms (00:10.511)
Time: 10504.831 ms (00:10.505)
Time: 10521.736 ms (00:10.522)
Time: 10513.039 ms (00:10.513)
Time: 10511.979 ms (00:10.512)

which is around 2% faster than HEAD, and around 0.3% slower than
0003-fixed-buf.patch

None of this had any noticeable impact on the time to run the
regression tests, and I tried a few other simple examples, but it was
difficult to get consistent results, above the normal variation of the
test timings.

TBH, I'm yet to be convinced that any of this is actually worthwhile.
We might shave a few percent off some simple numeric operations, but I
doubt it will make much difference to more complex computations. I'd
need to see some more realistic test results, or some real evidence of
palloc/pfree causing significant overhead in a numeric computation.

Regards,
Dean
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index a83feea..c937eaf
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -474,8 +474,14 @@ static void dump_var(const char *str, Nu
 #define dump_var(s,v)
 #endif
 
+/*
+ * Macros to allocate/free numeric digit buffers. Whenever we allocate a digit
+ * buffer, we give it an extra NUMERIC_HDRSZ (8 bytes) of space, so that the
+ * same memory block can be used by make_result() to construct a Numeric result
+ * from it, avoiding palloc/pfree overhead.
+ */
 #define digitbuf_alloc(ndigits)  \
-       ((NumericDigit *) palloc((ndigits) * sizeof(NumericDigit)))
+       ((NumericDigit *) palloc(NUMERIC_HDRSZ + (ndigits) * 
sizeof(NumericDigit)))
 #define digitbuf_free(buf)     \
        do { \
                 if ((buf) != NULL) \
@@ -783,8 +789,6 @@ numeric_in(PG_FUNCTION_ARGS)
                        ereturn(escontext, (Datum) 0,
                                        
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
                                         errmsg("value overflows numeric 
format")));
-
-               free_var(&value);
        }
 
        PG_RETURN_NUMERIC(res);
@@ -1141,8 +1145,6 @@ numeric_recv(PG_FUNCTION_ARGS)
                (void) apply_typmod_special(res, typmod, NULL);
        }
 
-       free_var(&value);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -1305,8 +1307,6 @@ numeric           (PG_FUNCTION_ARGS)
        (void) apply_typmod(&var, typmod, NULL);
        new = make_result(&var);
 
-       free_var(&var);
-
        PG_RETURN_NUMERIC(new);
 }
 
@@ -1566,7 +1566,6 @@ numeric_round(PG_FUNCTION_ARGS)
         */
        res = make_result(&arg);
 
-       free_var(&arg);
        PG_RETURN_NUMERIC(res);
 }
 
@@ -1615,7 +1614,6 @@ numeric_trunc(PG_FUNCTION_ARGS)
         */
        res = make_result(&arg);
 
-       free_var(&arg);
        PG_RETURN_NUMERIC(res);
 }
 
@@ -1642,7 +1640,6 @@ numeric_ceil(PG_FUNCTION_ARGS)
        ceil_var(&result, &result);
 
        res = make_result(&result);
-       free_var(&result);
 
        PG_RETURN_NUMERIC(res);
 }
@@ -1670,7 +1667,6 @@ numeric_floor(PG_FUNCTION_ARGS)
        floor_var(&result, &result);
 
        res = make_result(&result);
-       free_var(&result);
 
        PG_RETURN_NUMERIC(res);
 }
@@ -1793,7 +1789,13 @@ generate_series_step_numeric(PG_FUNCTION
                (fctx->step.sign == NUMERIC_NEG &&
                 cmp_var(&fctx->current, &fctx->stop) >= 0))
        {
-               Numeric         result = make_result(&fctx->current);
+               NumericVar      result_var;
+               Numeric         result;
+
+               /* copy current and make result from copy */
+               init_var(&result_var);
+               set_var_from_var(&fctx->current, &result_var);
+               result = make_result(&result_var);
 
                /* switch to memory context appropriate for iteration 
calculation */
                oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
@@ -2898,8 +2900,6 @@ numeric_add_opt_error(Numeric num1, Nume
 
        res = make_result_opt_error(&result, have_error);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -2976,8 +2976,6 @@ numeric_sub_opt_error(Numeric num1, Nume
 
        res = make_result_opt_error(&result, have_error);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -3097,8 +3095,6 @@ numeric_mul_opt_error(Numeric num1, Nume
 
        res = make_result_opt_error(&result, have_error);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -3232,8 +3228,6 @@ numeric_div_opt_error(Numeric num1, Nume
 
        res = make_result_opt_error(&result, have_error);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -3321,8 +3315,6 @@ numeric_div_trunc(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3410,8 +3402,6 @@ numeric_mod_opt_error(Numeric num1, Nume
 
        res = make_result_opt_error(&result, NULL);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -3443,8 +3433,6 @@ numeric_inc(PG_FUNCTION_ARGS)
 
        res = make_result(&arg);
 
-       free_var(&arg);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3537,8 +3525,6 @@ numeric_gcd(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3597,8 +3583,6 @@ numeric_lcm(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3648,9 +3632,6 @@ numeric_fac(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&fact);
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3721,8 +3702,6 @@ numeric_sqrt(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3788,8 +3767,6 @@ numeric_exp(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3837,8 +3814,6 @@ numeric_ln(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -3908,8 +3883,6 @@ numeric_log(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -4096,8 +4069,6 @@ numeric_power(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -4183,7 +4154,6 @@ numeric_min_scale(PG_FUNCTION_ARGS)
 
        init_var_from_num(num, &arg);
        min_scale = get_min_scale(&arg);
-       free_var(&arg);
 
        PG_RETURN_INT32(min_scale);
 }
@@ -4204,7 +4174,6 @@ numeric_trim_scale(PG_FUNCTION_ARGS)
        init_var_from_num(num, &result);
        result.dscale = get_min_scale(&result);
        res = make_result(&result);
-       free_var(&result);
 
        PG_RETURN_NUMERIC(res);
 }
@@ -4229,8 +4198,6 @@ int64_to_numeric(int64 val)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -4318,8 +4285,6 @@ int64_div_fast_to_numeric(int64 val1, in
 
        res = make_result(&result);
 
-       free_var(&result);
-
        return res;
 }
 
@@ -4529,8 +4494,6 @@ float8_numeric(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -4623,8 +4586,6 @@ float4_numeric(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 }
 
@@ -6005,8 +5966,6 @@ numeric_poly_sum(PG_FUNCTION_ARGS)
 
        res = make_result(&result);
 
-       free_var(&result);
-
        PG_RETURN_NUMERIC(res);
 #else
        return numeric_sum(fcinfo);
@@ -6035,8 +5994,6 @@ numeric_poly_avg(PG_FUNCTION_ARGS)
        countd = NumericGetDatum(int64_to_numeric(state->N));
        sumd = NumericGetDatum(make_result(&result));
 
-       free_var(&result);
-
        PG_RETURN_DATUM(DirectFunctionCall2(numeric_div, sumd, countd));
 #else
        return numeric_avg(fcinfo);
@@ -6073,7 +6030,6 @@ numeric_avg(PG_FUNCTION_ARGS)
        init_var(&sumX_var);
        accum_sum_final(&state->sumX, &sumX_var);
        sumX_datum = NumericGetDatum(make_result(&sumX_var));
-       free_var(&sumX_var);
 
        PG_RETURN_DATUM(DirectFunctionCall2(numeric_div, sumX_datum, N_datum));
 }
@@ -6105,7 +6061,6 @@ numeric_sum(PG_FUNCTION_ARGS)
        init_var(&sumX_var);
        accum_sum_final(&state->sumX, &sumX_var);
        result = make_result(&sumX_var);
-       free_var(&sumX_var);
 
        PG_RETURN_NUMERIC(result);
 }
@@ -6198,10 +6153,6 @@ numeric_stddev_internal(NumericAggState
                res = make_result(&vsumX);
        }
 
-       free_var(&vNminus1);
-       free_var(&vsumX);
-       free_var(&vsumX2);
-
        return res;
 }
 
@@ -7691,11 +7642,19 @@ duplicate_numeric(Numeric num)
 /*
  * make_result_opt_error() -
  *
- *     Create the packed db numeric format in palloc()'d memory from
- *     a variable.  This will handle NaN and Infinity cases.
+ *     Create the packed db numeric format from a variable.  This will handle 
NaN
+ *     and Infinity cases.
  *
  *     If "have_error" isn't NULL, on overflow *have_error is set to true and
  *     NULL is returned.  This is helpful when caller needs to handle errors.
+ *
+ *     NOTE: This reuses the memory allocated for the variable's digit buffer, 
if
+ *     possible.  The variable must not be freed or used in any way afterwards 
(we
+ *     may trample over its digit buffer).  If the variable doesn't have an
+ *     allocated digit buffer, new memory will be palloc'd and returned.  This
+ *     makes it safe to use with constant variables (const_one, const_nan, 
etc.),
+ *     as well as variables initialized using init_var_from_num(), zeroed using
+ *     zero_var(), etc.
  */
 static Numeric
 make_result_opt_error(const NumericVar *var, bool *have_error)
@@ -7756,7 +7715,9 @@ make_result_opt_error(const NumericVar *
        if (NUMERIC_CAN_BE_SHORT(var->dscale, weight))
        {
                len = NUMERIC_HDRSZ_SHORT + n * sizeof(NumericDigit);
-               result = (Numeric) palloc(len);
+               result = var->buf ? (Numeric) var->buf : (Numeric) palloc(len);
+               if (n > 0)
+                       memmove(result->choice.n_short.n_data, digits, n * 
sizeof(NumericDigit));
                SET_VARSIZE(result, len);
                result->choice.n_short.n_header =
                        (sign == NUMERIC_NEG ? (NUMERIC_SHORT | 
NUMERIC_SHORT_SIGN_MASK)
@@ -7768,7 +7729,9 @@ make_result_opt_error(const NumericVar *
        else
        {
                len = NUMERIC_HDRSZ + n * sizeof(NumericDigit);
-               result = (Numeric) palloc(len);
+               result = var->buf ? (Numeric) var->buf : (Numeric) palloc(len);
+               if (n > 0)
+                       memmove(result->choice.n_long.n_data, digits, n * 
sizeof(NumericDigit));
                SET_VARSIZE(result, len);
                result->choice.n_long.n_sign_dscale =
                        sign | (var->dscale & NUMERIC_DSCALE_MASK);
@@ -7776,8 +7739,6 @@ make_result_opt_error(const NumericVar *
        }
 
        Assert(NUMERIC_NDIGITS(result) == n);
-       if (n > 0)
-               memcpy(NUMERIC_DIGITS(result), digits, n * 
sizeof(NumericDigit));
 
        /* Check for overflow of int16 fields */
        if (NUMERIC_WEIGHT(result) != weight ||

Reply via email to