Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023 at 06:59:13AM +0100, Joel Jacobson wrote: > I noticed the NumericVar's pos_var and neg_var are not free_var()'d > at the end of accum_sum_final(). > > The potential memory leak seems small, since the function is called > only once per sum() per worker (and from a few more places), but > maybe they should be free'd anyways for correctness? Indeed, it is true that any code path of numeric.c that relies on a NumericVar with an allocation done in its buffer is careful enough to free it, except for generate_series's SRF where one step of the computation is done. I don't see directly why you could not do the following: @@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, NumericVar *result) /* And add them together */ add_var(&pos_var, &neg_var, result); + free_var(&pos_var); + free_var(&neg_var); + /* Remove leading/trailing zeroes */ strip_var(result); -- Michael signature.asc Description: PGP signature
Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023, at 07:26, Michael Paquier wrote: > Indeed, it is true that any code path of numeric.c that relies on a > NumericVar with an allocation done in its buffer is careful enough to > free it, except for generate_series's SRF where one step of the > computation is done. I don't see directly why you could not do the > following: > @@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, > NumericVar *result) > /* And add them together */ > add_var(&pos_var, &neg_var, result); > > + free_var(&pos_var); > + free_var(&neg_var); > + Thanks for looking and explaining. I added the free_var() calls after strip_var() to match the similar existing code at the end of sqrt_var(). Not that it matters, but thought it looked nicer to keep them in the same order. /Joel numeric-accum-sum-final-free-var.patch Description: Binary data
Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023 at 08:08:37AM +0100, Joel Jacobson wrote: > I added the free_var() calls after strip_var() to match the similar > existing code at the end of sqrt_var(). > Not that it matters, but thought it looked nicer to keep them in the > same order. WFM. Thanks! -- Michael signature.asc Description: PGP signature
Re: Missing free_var() at end of accum_sum_final()?
Hi, On 2023-02-16 15:26:26 +0900, Michael Paquier wrote: > On Thu, Feb 16, 2023 at 06:59:13AM +0100, Joel Jacobson wrote: > > I noticed the NumericVar's pos_var and neg_var are not free_var()'d > > at the end of accum_sum_final(). > > > > The potential memory leak seems small, since the function is called > > only once per sum() per worker (and from a few more places), but > > maybe they should be free'd anyways for correctness? > > Indeed, it is true that any code path of numeric.c that relies on a > NumericVar with an allocation done in its buffer is careful enough to > free it, except for generate_series's SRF where one step of the > computation is done. I don't see directly why you could not do the > following: > @@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, NumericVar > *result) > /* And add them together */ > add_var(&pos_var, &neg_var, result); > > + free_var(&pos_var); > + free_var(&neg_var); > + > /* Remove leading/trailing zeroes */ > strip_var(result); But why do we need it? Most SQL callable functions don't need to be careful about not leaking O(1) memory, the exception being functions backing btree opclasses. In fact, the detailed memory management often is *more* expensive than just relying on the calling memory context being reset. Of course, numeric.c doesn't really seem to have gotten that message, so there's a consistency argument here. Greetings, Andres Freund
Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023 at 01:35:54PM -0800, Andres Freund wrote: > But why do we need it? Most SQL callable functions don't need to be careful > about not leaking O(1) memory, the exception being functions backing btree > opclasses. > > In fact, the detailed memory management often is *more* expensive than just > relying on the calling memory context being reset. > > Of course, numeric.c doesn't really seem to have gotten that message, so > there's a consistency argument here. I don't know which final result is better. The arguments go two ways: 1) Should numeric.c be simplified so as its memory structure with extra pfree()s, making it more consistent with more global assumptions than just this file? This has the disadvantage of creating more noise in backpatching, while increasing the risk of leaks if some of the removed parts are allocated in a tight loop within the same context. This makes memory management less complicated. That's how I am understanding your point. 2) Should the style within numeric.c be more consistent? This is how I am understanding this proposal. As you quote, this makes memory management more complicated (not convinced about that for the internal of numerics?), while making the file more consistent. At the end, perhaps that's not worth bothering, but 2) prevails when it comes to the rule of making some code consistent with its surroundings. 1) has more risks seeing how old this code is. -- Michael signature.asc Description: PGP signature
Re: Missing free_var() at end of accum_sum_final()?
Hi, On 2023-02-17 11:48:14 +0900, Michael Paquier wrote: > On Thu, Feb 16, 2023 at 01:35:54PM -0800, Andres Freund wrote: > > But why do we need it? Most SQL callable functions don't need to be careful > > about not leaking O(1) memory, the exception being functions backing btree > > opclasses. > > > > In fact, the detailed memory management often is *more* expensive than just > > relying on the calling memory context being reset. > > > > Of course, numeric.c doesn't really seem to have gotten that message, so > > there's a consistency argument here. > > I don't know which final result is better. The arguments go two ways: > 1) Should numeric.c be simplified so as its memory structure with extra > pfree()s, making it more consistent with more global assumptions than > just this file? This has the disadvantage of creating more noise in > backpatching, while increasing the risk of leaks if some of the > removed parts are allocated in a tight loop within the same context. > This makes memory management less complicated. That's how I am > understanding your point. It's not just simplification, it's just faster to free via context reset. I whipped up a random query exercising numeric math a bunch: SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); Removing the free_var()s from numeric_add_opt_error() speeds it up from ~361ms to ~338ms. This code really needs some memory management overhead reduction love. Many allocation could be avoided by having a small on-stack "buffer" that's used unless the numeric is large. > 2) Should the style within numeric.c be more consistent? This is how > I am understanding this proposal. As you quote, this makes memory > management more complicated (not convinced about that for the internal > of numerics?), while making the file more consistent. > At the end, perhaps that's not worth bothering, but 2) prevails when > it comes to the rule of making some code consistent with its > surroundings. 1) has more risks seeing how old this code is. I'm a bit wary that this will trigger a stream of patches to pointlessly free things, causing churn and slowdowns. I suspect there's other places in numeric.c where we don't free, and there certainly are a crapton in other functions. Greetings, Andres Freund
Re: Missing free_var() at end of accum_sum_final()?
On Fri, Feb 17, 2023, at 21:26, Andres Freund wrote: > Removing the free_var()s from numeric_add_opt_error() speeds it up from ~361ms > to ~338ms. I notice numeric_add_opt_error() is extern and declared in numeric.h, called from e.g. timestamp.c and jsonpath_exec.c. Is that a problem, i.e. is there a risk it could be used in a for loop by some code outside Numeric? > This code really needs some memory management overhead reduction love. Many > allocation could be avoided by having a small on-stack "buffer" that's used > unless the numeric is large. Nice idea! Could something like the attached patch work? /Joel 0001-fixed-buf.patch Description: Binary data
Re: Missing free_var() at end of accum_sum_final()?
Hi again, Ignore previous patch, new correct version attached, that also keeps track of if the buf-field is in use or not. Seems like your idea gives a signifiant speed-up! On my machine, numeric_in() is 22% faster! I ran a benchmark with 100 tests measuring execution-time of numeric_in() with two significant digits time precision. Benchmark: CREATE EXTENSION pit; SELECT count(pit.async('numeric_in','{1234,0,-1}',2)) FROM generate_series(1,100); CALL pit.work(true); SELECT format('%s(%s)',pit.test_params.function_name, array_to_string(pit.test_params.input_values,',')), pit.pretty_time(pit.tests.final_result), count(*) FROM pit.tests JOIN pit.test_params USING (id) GROUP BY 1,2 ORDER BY 1,2; HEAD: format | pretty_time | count ---+-+--- numeric_in(1234,0,-1) | 31 ns |34 numeric_in(1234,0,-1) | 32 ns |66 (2 rows) 0002-fixed-buf.patch: format | pretty_time | count ---+-+--- numeric_in(1234,0,-1) | 24 ns | 4 numeric_in(1234,0,-1) | 25 ns |71 numeric_in(1234,0,-1) | 26 ns |25 (3 rows) The benchmark results was produced with: https://github.com/joelonsql/pg-timeit Now, the question is how big of a fixed_buf we want. I will run some more benchmarks. /Joel On Sun, Feb 19, 2023, at 20:54, Joel Jacobson wrote: > On Fri, Feb 17, 2023, at 21:26, Andres Freund wrote: >> Removing the free_var()s from numeric_add_opt_error() speeds it up from >> ~361ms >> to ~338ms. > > I notice numeric_add_opt_error() is extern and declared in numeric.h, > called from e.g. timestamp.c and jsonpath_exec.c. Is that a problem, > i.e. is there a risk it could be used in a for loop by some code > outside Numeric? > >> This code really needs some memory management overhead reduction love. Many >> allocation could be avoided by having a small on-stack "buffer" that's used >> unless the numeric is large. > > Nice idea! > Could something like the attached patch work? > > /Joel > Attachments: > * 0001-fixed-buf.patch -- Kind regards, Joel 0002-fixed-buf.patch Description: Binary data
Re: Missing free_var() at end of accum_sum_final()?
On Sun, Feb 19, 2023, at 23:16, Joel Jacobson wrote: > Hi again, > > Ignore previous patch, new correct version attached, that also keeps > track of if the buf-field is in use or not. Ops! One more thinko, detected when trying fixed_buf[8], which caused a seg fault. To fix, a init_var() in alloc_var() is needed when we will use the fixed_buf instead of allocating, since alloc_var() could be called in a loop for existing values, like in sqrt_var() for instance. Attached new version produces similar benchmark results, even with the extra init_var(): HEAD: format | pretty_time | count ---+-+--- numeric_in(1234,0,-1) | 31 ns |74 numeric_in(1234,0,-1) | 32 ns |26 (2 rows) 0003-fixed-buf.patch: format | pretty_time | count ---+-+--- numeric_in(1234,0,-1) | 24 ns | 1 numeric_in(1234,0,-1) | 25 ns |84 numeric_in(1234,0,-1) | 26 ns |15 (3 rows) /Joel 0003-fixed-buf.patch Description: Binary data
Re: Missing free_var() at end of accum_sum_final()?
On Sun, Feb 19, 2023 at 11:55:38PM +0100, Joel Jacobson wrote: > To fix, a init_var() in alloc_var() is needed when we will use the > fixed_buf instead of allocating, > since alloc_var() could be called in a loop for existing values, > like in sqrt_var() for instance. > > Attached new version produces similar benchmark results, even with the extra > init_var(): Perhaps you should register this patch to the commit of March? Here it is: https://commitfest.postgresql.org/42/ -- Michael signature.asc Description: PGP signature
Re: Missing free_var() at end of accum_sum_final()?
On Mon, Feb 20, 2023, at 08:38, Michael Paquier wrote: > Perhaps you should register this patch to the commit of March? Here > it is: > https://commitfest.postgresql.org/42/ Thanks, done. /Joel
Re: Missing free_var() at end of accum_sum_final()?
On Mon, 20 Feb 2023 at 08:03, Joel Jacobson wrote: > > On Mon, Feb 20, 2023, at 08:38, Michael Paquier wrote: > > Perhaps you should register this patch to the commit of March? Here > > it is: > > https://commitfest.postgresql.org/42/ > > Thanks, done. > I have been testing this a bit, and I get less impressive results than the ones reported so far. Testing Andres' example: SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); with HEAD, I get: Time: 216.978 ms Time: 215.376 ms Time: 214.973 ms Time: 216.288 ms Time: 216.494 ms and removing the free_var() from numeric_add_opt_error() I get: Time: 212.706 ms Time: 212.684 ms Time: 211.378 ms Time: 213.383 ms Time: 213.050 ms That's 1-2% faster, not the 6-7% that Andres saw. Testing the same example with the latest 0003-fixed-buf.patch, I get: Time: 224.115 ms Time: 225.382 ms Time: 225.691 ms Time: 224.135 ms Time: 225.412 ms which is now 4% slower. I think the problem is that if you increase the size of NumericVar, you increase the stack space required, as well as adding some overhead to alloc_var(). Also, primitive operations like add_var() directly call digitbuf_alloc(), so as it stands, they don't benefit from the static buffer. Also, I'm not convinced that a 4-digit static buffer would really be of much benefit to many numeric computations anyway. To try to test the real-world benefit to numeric_in(), I re-ran one of the tests I used while testing the non-decimal integer patches, using COPY to read a large number of random numerics from a file: CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric, c5 numeric, c6 numeric, c7 numeric, c8 numeric, c9 numeric, c10 numeric); INSERT INTO foo SELECT trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)) FROM generate_series(1,1000); COPY foo TO '/tmp/random-numerics.csv'; \timing on TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; With HEAD, this gave: Time: 10750.298 ms (00:10.750) Time: 10746.248 ms (00:10.746) Time: 10772.277 ms (00:10.772) Time: 10758.282 ms (00:10.758) Time: 10760.425 ms (00:10.760) and with 0003-fixed-buf.patch, it gave: Time: 10623.254 ms (00:10.623) Time: 10463.814 ms (00:10.464) Time: 10461.700 ms (00:10.462) Time: 10429.436 ms (00:10.429) Time: 10438.359 ms (00:10.438) So that's a 2-3% gain, which might be worthwhile, if not for the slowdown in the other case. I actually had a slightly different idea to improve numeric.c's memory management, which gives a noticeable improvement for a few of the simple numeric functions: A common pattern in these numeric functions is to allocate memory for the NumericVar's digit buffer while doing the computation, allocate more memory for the Numeric result, copy the digits across, and then free the NumericVar's digit buffer. That can be reduced to just 1 palloc() and no pfree()'s by ensuring that the initial allocation is large enough to hold the final Numeric result, and then re-using that memory instead of allocating more. That can be applied to all the numeric functions, saving a palloc() and a pfree() in each case, and it fits quite well with the way make_result() is used in all but one case (generate_series() needs to be a little more careful to avoid trampling on the digit buffer of the current value). In Andres' generate_series() example, this gave: Time: 203.838 ms Time: 206.623 ms Time: 204.672 ms Time: 202.434 ms Time: 204.893 ms which is around 5-6% faster. In the COPY test, it gave: Time: 10511.293 ms (00:10.511) Time: 10504.831 ms (00:10.505) Time: 10521.736 ms (00:10.522) Time: 10513.039 ms (00:10.513) Time: 10511.979 ms (00:10.512) which is around 2% faster than HEAD, and around 0.3% slower than 0003-fixed-buf.patch None of this had any noticeable impact on the time to run the regression tests, and I tried a few other simple examples, but it was difficult to get consistent results, above the normal variation of the test timings. TBH, I'm yet to be convinced that any of this is actually worthwhile. We might shave a few percent off some simple numeric operations, but I
Re: Missing free_var() at end of accum_sum_final()?
Hi, I found another small but significant improvement of the previous patch: else if (ndigits < var->buf_len) { -memset(var->buf, 0, var->buf_len); +var->buf[0] = 0; var->digits = var->buf + 1; var->ndigits = ndigits; } We don't need to set all buf elements to zero, only the first one. This is not an improvement of HEAD, it's just a mistake I made in my previous patch. COPY foo FROM '/tmp/random-numerics.csv'; HEAD: Time: 8431.325 ms (00:08.431) Time: 8424.749 ms (00:08.425) Time: 8425.387 ms (00:08.425) Time: 8519.869 ms (00:08.520) Time: 8452.585 ms (00:08.453) 0004-fixed-buf.patch: Time: 8539.475 ms (00:08.539) Time: 8401.628 ms (00:08.402) Time: 8399.440 ms (00:08.399) Time: 8373.861 ms (00:08.374) Time: 8388.002 ms (00:08.388) 0005-fixed-buf.patch: Time: 8038.218 ms (00:08.038) Time: 8082.898 ms (00:08.083) Time: 7999.950 ms (00:08.000) Time: 8039.640 ms (00:08.040) Time: 7994.816 ms (00:07.995) Almost half a second faster! /Joel 0005-fixed-buf.patch Description: Binary data
Re: Missing free_var() at end of accum_sum_final()?
Hi, My apologies, it seems my email didn't reach the list, probably due to the benchmark plot images being to large. Here is the email again, but with URLs to the images instead, and benchmark updated with results for the 0005-fixed-buf.patch. -- On Mon, Feb 20, 2023, at 12:32, Dean Rasheed wrote: > TBH, I'm yet to be convinced that any of this is actually worthwhile. > We might shave a few percent off some simple numeric operations, but I > doubt it will make much difference to more complex computations. I'd > need to see some more realistic test results, or some real evidence of > palloc/pfree causing significant overhead in a numeric computation. Thanks for testing! Good point, I agree with your conclusion; the small change to alloc_var() suggested in 0003-fixed-buf.patch probably didn't provide enough speedups to motivate the change. In the new attached patch, Andres fixed buffer idea has been implemented throughout the entire numeric.c code base. CHANGES: * Instead of a bool, we now have a buf_len struct field, to keep track of the allocated capacity, which can be different from the numbers of digits currently used. buf_len == 0 indicates no memory is allocated, and fixed_buf is possibly used instead. * alloc_var() now reuses the allocated buffer if there is one big enough. * free_var() and zero_var() only call digitbuf_free() if they need to. * set_var_from_var() now also uses the fixed buffer and also reuses the allocated buffer if there is one and it's big enough. * To allow the NumericVar's on the stack to be used without having to allocate digits buf, we have to change callers e.g. add_var(), div_var(), etc, to ensure the result variable isn't the same as one of the operand NumericVars. This wasn't necessary before, since we always allocated a new digits buf for the result, which could then be assigned to the digits field when done with computations. However, this prevented us from relying solely on the existing NumericVar stack variables, so we needed a few new temporary NumericVar stack variables, to hold intermediate results, and set_var_from_var() to copy the operand into the temp var. Assert()'s have been added to such functions, add_abs(), sub_abs(), div_var(), div_var_fast(), div_var_int64() and div_var_int64(), that enforce result being a different object than the two operands. Here is one example from ceil_var() on this from the new 0004-fixed-buf.patch: HEAD: if (var->sign == NUMERIC_POS && cmp_var(var, &tmp) != 0) add_var(&tmp, &const_one, &tmp); set_var_from_var(&tmp, result); The add_var() is a problem since &tmp is both the first operand and the result. Funnily enough, the fix in this particular case, and in floor_var(), is simpler and should be faster: 0005-fixed-buf.patch: if (var->sign == NUMERIC_POS && cmp_var(var, &tmp) != 0) add_var(&tmp, &const_one, result); else set_var_from_var(&tmp, result); This is avoids the set_var_from_var() if the if-branch is taken, as the result is written directly to result. Another example from sqrt_var(): HEAD: add_var(&q_var, &a1_var, &q_var); 0005-fixed-buf.patch: set_var_from_var(&q_var, &tmp_var); add_var(&tmp_var, &a1_var, &q_var); The extra set_var_from_var() seems to be a net-win in many cases, except for the generate_series() example which doesn't have any NumericVar's on the stack, except for the first iteration. BENCHMARK: -- SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); HEAD: Time: 158.698 ms Time: 142.486 ms Time: 141.443 ms Time: 142.044 ms Time: 141.651 ms 0005-fixed-buf.patch: Time: 156.371 ms Time: 149.857 ms Time: 150.674 ms Time: 150.409 ms Time: 150.461 ms SELECT setseed(0.1234); CREATE TABLE t (n1 numeric, n2 numeric, n3 numeric, n4 numeric); INSERT INTO t (n1, n2, n3, n4) SELECT round(random()::numeric,2), round(random()::numeric*10,2), round(random()::numeric*100,2), round(random()::numeric*1000,2) FROM generate_series(1,1e7); CHECKPOINT; SELECT SUM(n1+n2+n3+n4) FROM t; HEAD: Time: 758.489 ms Time: 646.794 ms Time: 643.237 ms Time: 642.620 ms Time: 646.218 ms 0005-fixed-buf.patch: Time: 748.093 ms Time: 628.799 ms Time: 629.853 ms Time: 629.166 ms Time: 627.768 ms CREATE TABLE ledger (amount numeric); INSERT INTO ledger (amount) SELECT generate_series(-10.00,10,0.01); CHECKPOINT; SELECT SUM(amount*1.25 + 0.5) FROM ledger; HEAD: Time: 1113.080 ms (00:01.113) Time: 931.998 ms Time: 931.009 ms Time: 932.476 ms Time: 933.509 ms 0005-fixed-buf.patch: Time: 1067.298 ms (00:01.067) Time: 883.972 ms Time: 880.165 ms Time: 882.465 ms Time: 893.646 ms CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric, c5 numeric, c6 numeric, c7 numeric, c8 numeric, c9 numeric, c10 numeric
Re: Missing free_var() at end of accum_sum_final()?
On 20.02.23 23:16, Joel Jacobson wrote: In the new attached patch, Andres fixed buffer idea has been implemented throughout the entire numeric.c code base. I think the definition of the "preinitialized constants" could be adapted to this as well. For example, instead of static const NumericDigit const_one_data[1] = {1}; static const NumericVar const_one = {1, 0, NUMERIC_POS, 0, NULL, (NumericDigit *) const_one_data}; it could be something like static const NumericVar const_one = {0, 0, NUMERIC_POS, 0, NULL, NULL, 1, {1}}; Or perhaps with designators: static const NumericVar const_one = {.sign = NUMERIC_POS, .buflen = 1, .fixed_buf = {1}};
Re: Missing free_var() at end of accum_sum_final()?
> On 20.02.23 23:16, Joel Jacobson wrote: > > In the new attached patch, Andres fixed buffer idea has been implemented > > throughout the entire numeric.c code base. > I have been going over this patch, and IMO it's far too invasive for the fairly modest performance gains (and performance regressions in some cases) that it gives (which seem to be somewhat smaller on my machine). One code change that I am particularly opposed to is changing all the low-level functions like add_var(), mul_var(), etc., so that they no longer accept the result being the same variable as any of the inputs. That is a particularly convenient thing to be able to do, and without it, various functions become more complex and less readable, and have to resort to using more temporary variables. I actually find the whole business of attaching a static buffer and new buf_len fields to NumericVar quite ugly, and the associated extra complexity in alloc_var(), free_var(), zero_var(), and set_var_from_var() is all part of that. Now that might be worth it, if this gave significant performance gains across the board, but the trouble is it doesn't. AFAICS, it seems to be just as likely to degrade performance. For example: SELECT sqrt(6*sum(1/x/x)) FROM generate_series(1::numeric ,1000::numeric) g(x); is consistently 1-2% slower for me, with this patch. That's not much, but then neither are most of the gains. In a lot of cases, it's so close to the level of noise that I don't think most users will notice one way or the other. So IMO the results just don't justify such an extensive set of changes, and I think we should abandon this fixed buffer approach. Having said that, I think the results from the COPY test are worth looking at more closely. Your results seem to suggest around a 5% improvement. On my machine it was only around 3%, but that still might be worth having, if it didn't involve such invasive changes throughout the rest of the code. As an experiment, I took another look at my earlier patch, making make_result() construct the result using the same allocated memory as the variable's digit buffer (if allocated). That eliminates around a third of all free_var() calls from numeric.c, and for most functions, it saves both a palloc() and a pfree(). In the case of numeric_in(), I realised that it's possible to go further, by reusing the decimal digits buffer for the NumericVar's digits, and then later for the final Numeric result. Also, by carefully aligning things, it's possible to arrange it so that the final make_result() doesn't need to copy/move the digits at all. With that I get something closer to a 15% improvement in the COPY test, which is definitely worth having. In the pi series above, it gave a 3-4% performance improvement, and that seemed to be a common pattern across a number of other tests. It's also a much less invasive change, since it's only really changing make_result(), which makes the knock-on effects much more manageable, and reduces the chances of any performance regressions. I didn't do all the tests that you did though, so it would be interesting to see how it fares in those. Regards, Dean diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index a83feea..c024fcf --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -474,8 +474,14 @@ static void dump_var(const char *str, Nu #define dump_var(s,v) #endif +/* + * Macros to allocate/free numeric digit buffers. Whenever we allocate a digit + * buffer, we give it an extra NUMERIC_HDRSZ (8 bytes) of space, so that the + * same memory block can be used by make_result() to construct a Numeric result + * from it, avoiding palloc/pfree overhead. + */ #define digitbuf_alloc(ndigits) \ - ((NumericDigit *) palloc((ndigits) * sizeof(NumericDigit))) + ((NumericDigit *) palloc(NUMERIC_HDRSZ + (ndigits) * sizeof(NumericDigit))) #define digitbuf_free(buf) \ do { \ if ((buf) != NULL) \ @@ -783,8 +789,6 @@ numeric_in(PG_FUNCTION_ARGS) ereturn(escontext, (Datum) 0, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("value overflows numeric format"))); - - free_var(&value); } PG_RETURN_NUMERIC(res); @@ -1141,8 +1145,6 @@ numeric_recv(PG_FUNCTION_ARGS) (void) apply_typmod_special(res, typmod, NULL); } - free_var(&value); - PG_RETURN_NUMERIC(res); } @@ -1305,8 +1307,6 @@ numeric (PG_FUNCTION_ARGS) (void) apply_typmod(&var, typmod, NULL); new = make_result(&var); - free_var(&var); - PG_RETURN_NUMERIC(new); } @@ -1566,7 +1566,6 @@ numeric_round(PG_FUNCTION_ARGS) */ res = make_result(&arg); - free_var(&arg); PG_RETURN_NUMERIC(res); } @@ -1615,7 +1614,6 @@ numeric_trunc(PG_FUNCTION_ARGS) */ res = make_result(&arg); - free_var(&arg); PG_RETURN_NUMERIC(res); } @@ -1642,7 +1640,6 @@ numeric_ceil(PG_FUNCTION_ARGS) ceil_var(&result, &result); res = make_result(&result); - free_var(&result); PG_RETURN_NUMERIC(res); }
Re: Missing free_var() at end of accum_sum_final()?
On Fri, Mar 3, 2023, at 16:11, Dean Rasheed wrote: > So IMO the results just don't justify such an extensive set of > changes, and I think we should abandon this fixed buffer approach. I agree. I was hoping it would be possible to reduce the invasiveness, but I think it's difficult and probably not worth it. > Having said that, I think the results from the COPY test are worth > looking at more closely. Your results seem to suggest around a 5% > improvement. On my machine it was only around 3%, but that still might > be worth having, if it didn't involve such invasive changes throughout > the rest of the code. > > As an experiment, I took another look at my earlier patch, making > make_result() construct the result using the same allocated memory as > the variable's digit buffer (if allocated). That eliminates around a > third of all free_var() calls from numeric.c, and for most functions, > it saves both a palloc() and a pfree(). In the case of numeric_in(), I > realised that it's possible to go further, by reusing the decimal > digits buffer for the NumericVar's digits, and then later for the > final Numeric result. Also, by carefully aligning things, it's > possible to arrange it so that the final make_result() doesn't need to > copy/move the digits at all. With that I get something closer to a 15% > improvement in the COPY test, which is definitely worth having. Nice! Patch LGTM. > I didn't do all the tests that you did though, so it would be > interesting to see how it fares in those. SELECT count(*) FROM generate_series(1::numeric, 1000::numeric); Time: 1196.801 ms (00:01.197) -- HEAD Time: 1278.376 ms (00:01.278) -- make-result-using-vars-buf-v2.patch TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; Time: 8450.551 ms (00:08.451) -- HEAD Time: 7176.838 ms (00:07.177) -- make-result-using-vars-buf-v2.patch SELECT SUM(n1+n2+n3+n4) FROM t; Time: 643.961 ms -- HEAD Time: 620.303 ms -- make-result-using-vars-buf-v2.patch SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); Time: 141.070 ms -- HEAD Time: 139.562 ms -- make-result-using-vars-buf-v2.patch SELECT SUM(amount*1.25 + 0.5) FROM ledger; Time: 933.461 ms -- HEAD Time: 862.619 ms -- make-result-using-vars-buf-v2.patch Looks like a win in all cases except the first one. Great work! /Joel
Re: Missing free_var() at end of accum_sum_final()?
On Fri, Mar 3, 2023, at 16:11, Dean Rasheed wrote: > Attachments: > * make-result-using-vars-buf-v2.patch One suggestion: maybe add a comment explaining why the allocated buffer which size is based on strlen(cp) for the decimal digit values, is guaranteed to be large enough also for the result's digit buffer? I.e. some kind of proof why (NUMERIC_HDRSZ + strlen(cp) + DEC_DIGITS * 2) >= ((ndigits + 1) * sizeof(NumericDigit))) holds true in general. /Joel
Re: Missing free_var() at end of accum_sum_final()?
On 05.03.23 09:53, Joel Jacobson wrote: On Fri, Mar 3, 2023, at 16:11, Dean Rasheed wrote: Attachments: * make-result-using-vars-buf-v2.patch One suggestion: maybe add a comment explaining why the allocated buffer which size is based on strlen(cp) for the decimal digit values, is guaranteed to be large enough also for the result's digit buffer? I.e. some kind of proof why (NUMERIC_HDRSZ + strlen(cp) + DEC_DIGITS * 2) >= ((ndigits + 1) * sizeof(NumericDigit))) holds true in general. It looks like this thread has fizzled out, and no one is really working on the various proposed patch variants. Unless someone indicates that they are still seriously pursuing this, I will mark this patch as "Returned with feedback".