Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Hi David, Tom, all, > I've just pushed the disable byref Datums patch I posted earlier. I > only made a small adjustment to make use of the TupleDescAttr() macro. > Önder, thank you for the report. > > With this commit, I re-run the query patterns where we observed the problem, all looks good now. Wanted to share this information as fyi. Thanks for the quick turnaround! Onder KALACI
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Thu, Sep 29, 2022 at 7:10 AM Tom Lane wrote: > TBH, I think this is completely ridiculous over-optimization. > There's exactly zero evidence that a second copy of the function > would improve performance, or do anything but contribute to code > bloat (which does have a distributed performance cost). I thought that that was unjustified myself. -- Peter Geoghegan
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Le jeudi 29 septembre 2022, 16:10:03 CEST Tom Lane a écrit : > Ronan Dunklau writes: > >> Yeah, I think the same rules around scope apply as > >> tuplesort_gettupleslot() with copy==false. We could do it by adding a > >> copy flag to the existing function, but I'd rather not add the > >> branching to that function. It's probably just better to duplicate it > >> and adjust. > > > > For the record, I tried to see if gcc would optimize the function by > > generating two different versions when copy is true or false, thus getting rid > > of the branching while still having only one function to deal with. > > TBH, I think this is completely ridiculous over-optimization. > There's exactly zero evidence that a second copy of the function > would improve performance, or do anything but contribute to code > bloat (which does have a distributed performance cost). I wasn't commenting on the merit of the optimization, but just that I tried to get gcc to apply it itself, which it doesn't. Regards, -- Ronan Dunklau
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Ronan Dunklau writes: >> Yeah, I think the same rules around scope apply as >> tuplesort_gettupleslot() with copy==false. We could do it by adding a >> copy flag to the existing function, but I'd rather not add the >> branching to that function. It's probably just better to duplicate it >> and adjust. > For the record, I tried to see if gcc would optimize the function by > generating two different versions when copy is true or false, thus getting > rid > of the branching while still having only one function to deal with. TBH, I think this is completely ridiculous over-optimization. There's exactly zero evidence that a second copy of the function would improve performance, or do anything but contribute to code bloat (which does have a distributed performance cost). regards, tom lane
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
> I've just pushed the disable byref Datums patch I posted earlier. I > only made a small adjustment to make use of the TupleDescAttr() macro. > Önder, thank you for the report. Thank you David for taking care of this. > Yeah, I think the same rules around scope apply as > tuplesort_gettupleslot() with copy==false. We could do it by adding a > copy flag to the existing function, but I'd rather not add the > branching to that function. It's probably just better to duplicate it > and adjust. > For the record, I tried to see if gcc would optimize the function by generating two different versions when copy is true or false, thus getting rid of the branching while still having only one function to deal with. Using the -fipa-cp-clone (or even the whole set of additional flags coming with -O3), it does generate a special-case version of the function, but it seems to then only be used by heapam_index_validate_scan and percentile_cont_multi_final_common. This is from my investigation looking for references to the specialized version in the DWARF debug information. Regards, -- Ronan Dunklau
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Wed, Sep 28, 2022 at 9:59 PM David Rowley wrote: > select b from t1 order by b offset 100; > > Master: > latency average = 344.763 ms > > Patched: > latency average = 268.374 ms > > about 28% faster. That's more like it! -- Peter Geoghegan
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Thu, 29 Sept 2022 at 14:32, Peter Geoghegan wrote: > > On Wed, Sep 28, 2022 at 6:13 PM David Rowley wrote: > > Master: > > latency average = 313.197 ms > > > > Patched: > > latency average = 304.335 ms > > > > So not a very impressive speedup there (about 3%) > > Worth a try, at least. Having a more consistent interface is valuable > in itself too. Just testing the datum sort in nodeSort.c with the same table as before but using the query: select b from t1 order by b offset 100; Master: latency average = 344.763 ms Patched: latency average = 268.374 ms about 28% faster. I'll take this to another thread and put it in the next CF David
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Wed, Sep 28, 2022 at 6:13 PM David Rowley wrote: > Master: > latency average = 313.197 ms > > Patched: > latency average = 304.335 ms > > So not a very impressive speedup there (about 3%) Worth a try, at least. Having a more consistent interface is valuable in itself too. -- Peter Geoghegan
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Thu, 29 Sept 2022 at 12:07, Peter Geoghegan wrote: > Also potentially relevant: the 2017 commit fa117ee4 anticipated adding > a "copy" argument to tuplesort_getdatum() (the same commit added such > a "copy" argument to tuplesort_gettupleslot()). I see that that still > hasn't happened to tuplesort_getdatum() all these years later. Might > be a good idea to do it in the next year or two, though. > > If David is interested in pursuing this now then I certainly won't object. Just while this is fresh in my head, I wrote some code to make this happen. My preference would be not to add the "copy" param to the existing function and instead just add a new function to prevent additional branching. The attached puts back the datum sort in nodeSort.c for byref types and adjusts process_ordered_aggregate_single() to make use of this function. I did a quick benchmark to see if this help DISTINCT aggregate any: create table t1 (a varchar(32) not null, b varchar(32) not null); insert into t1 select md5((x%10)::text),md5((x%10)::text) from generate_Series(1,100)x; vacuum freeze t1; create index on t1(a); With a work_mem of 256MBs I get: query = select max(distinct a), max(distinct b) from t1; Master: latency average = 313.197 ms Patched: latency average = 304.335 ms So not a very impressive speedup there (about 3%) Some excerpts from perf top show: Master: 1.40% postgres [.] palloc 1.13% postgres [.] tuplesort_getdatum 0.77% postgres [.] datumCopy Patched: 0.91% postgres [.] tuplesort_getdatum_nocopy 0.65% postgres [.] palloc I stared for a while at the mode_final() function and thought maybe we could use the nocopy variant there. I just didn't quite pluck up the motivation to write any code to see if it could be made faster. David diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index fe74e49814..b5f63b5a2b 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -878,8 +878,8 @@ process_ordered_aggregate_single(AggState *aggstate, * pfree them when they are no longer needed. */ - while (tuplesort_getdatum(pertrans->sortstates[aggstate->current_set], - true, newVal, isNull, )) + while (tuplesort_getdatum_nocopy(pertrans->sortstates[aggstate->current_set], +true, newVal, isNull, )) { /* * Clear and select the working context for evaluation of the equality @@ -900,24 +900,33 @@ process_ordered_aggregate_single(AggState *aggstate, pertrans->aggCollation, oldVal, *newVal) { - /* equal to prior, so forget this one */ - if (!pertrans->inputtypeByVal && !*isNull) - pfree(DatumGetPointer(*newVal)); + MemoryContextSwitchTo(oldContext); + continue; } else { advance_transition_function(aggstate, pertrans, pergroupstate); - /* forget the old value, if any */ - if (!oldIsNull && !pertrans->inputtypeByVal) - pfree(DatumGetPointer(oldVal)); - /* and remember the new one for subsequent equality checks */ - oldVal = *newVal; + + MemoryContextSwitchTo(oldContext); + + /* +* Forget the old value, if any and remember the new one for +* subsequent equality checks. +*/ + if (!pertrans->inputtypeByVal) + { + if (!oldIsNull) + pfree(DatumGetPointer(oldVal)); + if (!*isNull) + oldVal = datumCopy(*newVal, pertrans->inputtypeByVal, + pertrans->inputtypeLen); + } + else + oldVal = *newVal; oldAbbrevVal = newAbbrevVal; oldIsNull = *isNull; haveOldVal = true; } - - MemoryContextSwitchTo(oldContext); } if (!oldIsNull && !pertrans->inputtypeByVal) diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c index 37ad35704b..f418be30a1 100644 --- a/src/backend/executor/nodeSort.c +++ b/src/backend/executor/nodeSort.c @@ -197,8 +197,8 @@
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Wed, Sep 28, 2022 at 07:35:07PM -0400, Tom Lane wrote: > Michael Paquier writes: >> Wouldn't it be better to have 3a58176 reflect the non-optimization >> path in the EXPLAIN output of a new regression test if none of the >> existing tests are able to show any difference? > > This decision is not visible in EXPLAIN in any case. Okay, thanks! -- Michael signature.asc Description: PGP signature
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Thu, 29 Sept 2022 at 12:30, Michael Paquier wrote: > > On Thu, Sep 29, 2022 at 11:58:17AM +1300, David Rowley wrote: > > I've just pushed the disable byref Datums patch I posted earlier. I > > only made a small adjustment to make use of the TupleDescAttr() macro. > > Önder, thank you for the report. > > Wouldn't it be better to have 3a58176 reflect the non-optimization > path in the EXPLAIN output of a new regression test if none of the > existing tests are able to show any difference? There's nothing in EXPLAIN that shows that this optimization occurs. Or, are you proposing that you think there should be something? and for 15?? David
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Michael Paquier writes: > Wouldn't it be better to have 3a58176 reflect the non-optimization > path in the EXPLAIN output of a new regression test if none of the > existing tests are able to show any difference? This decision is not visible in EXPLAIN in any case. regards, tom lane
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Thu, Sep 29, 2022 at 11:58:17AM +1300, David Rowley wrote: > I've just pushed the disable byref Datums patch I posted earlier. I > only made a small adjustment to make use of the TupleDescAttr() macro. > Önder, thank you for the report. Wouldn't it be better to have 3a58176 reflect the non-optimization path in the EXPLAIN output of a new regression test if none of the existing tests are able to show any difference? -- Michael signature.asc Description: PGP signature
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Wed, Sep 28, 2022 at 4:00 PM Peter Geoghegan wrote: > I am reminded of the discussion that led to bugfix commit c2d4eb1b > some years back. Also potentially relevant: the 2017 commit fa117ee4 anticipated adding a "copy" argument to tuplesort_getdatum() (the same commit added such a "copy" argument to tuplesort_gettupleslot()). I see that that still hasn't happened to tuplesort_getdatum() all these years later. Might be a good idea to do it in the next year or two, though. If David is interested in pursuing this now then I certainly won't object. -- Peter Geoghegan
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Wed, Sep 28, 2022 at 12:57 PM Tom Lane wrote: > David Rowley writes: > > I'm wondering if the best way to fix it if doing it that way would be > > to invent tuplesort_getdatum_nocopy() which would be the same as > > tuplesort_getdatum() except it wouldn't do the datumCopy for byref > > types. > > Yeah, perhaps. We'd need a clear spec on how long the Datum could > be presumed good --- probably till the next tuplesort_getdatum_nocopy > call, but that'd need to be checked --- and then check if that is > satisfactory for nodeSort's purposes. I am reminded of the discussion that led to bugfix commit c2d4eb1b some years back. As the commit message of that old bugfix notes, tuplesort_getdatum() and tuplesort_gettupleslot() are "the odd ones out" among "get tuple" routines (i.e. routines that get a tuple from a tuplesort by calling tuplesort_gettuple_common()). We used to sometimes do that with tuplesort_getindextuple() and possible other such routines, but the need for that capability was eliminated on the caller side around the same time as the bugfix went in. -- Peter Geoghegan
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
On Thu, 29 Sept 2022 at 08:57, Tom Lane wrote: > > David Rowley writes: > > I'm wondering if the best way to fix it if doing it that way would be > > to invent tuplesort_getdatum_nocopy() which would be the same as > > tuplesort_getdatum() except it wouldn't do the datumCopy for byref > > types. > > Yeah, perhaps. We'd need a clear spec on how long the Datum could > be presumed good --- probably till the next tuplesort_getdatum_nocopy > call, but that'd need to be checked --- and then check if that is > satisfactory for nodeSort's purposes. Yeah, I think the same rules around scope apply as tuplesort_gettupleslot() with copy==false. We could do it by adding a copy flag to the existing function, but I'd rather not add the branching to that function. It's probably just better to duplicate it and adjust. > If we had such a thing, I wonder if any of the other existing > tuplesort_getdatum callers would be happier with that. nodeAgg for > one is tediously freeing the result, but could we drop that logic? Looking at process_ordered_aggregate_single(), it's likely more efficient to use the nocopy version and just perform a datumCopy() when we need to store the oldVal. At least, that would be more efficient when many values are being skipped due to being the same as the last one. I've just pushed the disable byref Datums patch I posted earlier. I only made a small adjustment to make use of the TupleDescAttr() macro. Önder, thank you for the report. David
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
David Rowley writes: > I'm wondering if the best way to fix it if doing it that way would be > to invent tuplesort_getdatum_nocopy() which would be the same as > tuplesort_getdatum() except it wouldn't do the datumCopy for byref > types. Yeah, perhaps. We'd need a clear spec on how long the Datum could be presumed good --- probably till the next tuplesort_getdatum_nocopy call, but that'd need to be checked --- and then check if that is satisfactory for nodeSort's purposes. If we had such a thing, I wonder if any of the other existing tuplesort_getdatum callers would be happier with that. nodeAgg for one is tediously freeing the result, but could we drop that logic? (I hasten to add that I'm not proposing we touch that for v15.) regards, tom lane
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Thanks for investigating this and finding the guilty commit. On Thu, 29 Sept 2022 at 07:34, Tom Lane wrote: > After looking at that for a little while, I wonder if we shouldn't > fix this by restricting the Datum-sort path to be used only with > pass-by-value data types. That'd require only a minor addition > to the new logic in ExecInitSort. I'm also wondering if that's the best fix given the timing of this discovery. > The alternative of inserting a pfree of the old value would complicate > the code nontrivially, I think, and really it would necessitate a > complete performance re-test. I'm wondering if the claimed speedup > for pass-by-ref types wasn't fictional and based on skipping the > required pfrees. Besides, if you think this code is hot enough that > you don't want to add a test-and-branch per tuple (a claim I also > doubt, BTW) then you probably don't want to add such overhead into > the pass-by-value case where the speedup is clear. I'm wondering if the best way to fix it if doing it that way would be to invent tuplesort_getdatum_nocopy() which would be the same as tuplesort_getdatum() except it wouldn't do the datumCopy for byref types. It looks like tuplesort_gettupleslot() when copy==false just directly stores the MinimalTuple that's in stup.tuple and shouldFree is set to false. Going by [1], it looks like I saw gains in test 6, which was a byref Datum. Skipping the datumCopy() I imagine could only make the gains slightly higher on that. That puts me a bit more on the fence about the best fix for PG15. I've attached a patch to restrict the optimisation to byval types in the meantime. David [1] https://www.postgresql.org/message-id/CAApHDvrWV%3Dv0qKsC9_BHqhCn9TusrNvCaZDz77StCO--fmgbKA%40mail.gmail.com diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c index 3c28d60c3e..740ad37717 100644 --- a/src/backend/executor/nodeSort.c +++ b/src/backend/executor/nodeSort.c @@ -220,6 +220,7 @@ SortState * ExecInitSort(Sort *node, EState *estate, int eflags) { SortState *sortstate; + TupleDesc outerTupDesc; SO1_printf("ExecInitSort: %s\n", "initializing sort node"); @@ -274,11 +275,13 @@ ExecInitSort(Sort *node, EState *estate, int eflags) ExecInitResultTupleSlotTL(>ss.ps, ); sortstate->ss.ps.ps_ProjInfo = NULL; + outerTupDesc = ExecGetResultType(outerPlanState(sortstate)); + /* -* We perform a Datum sort when we're sorting just a single column, +* We perform a Datum sort when we're sorting just a single byval column, * otherwise we perform a tuple sort. */ - if (ExecGetResultType(outerPlanState(sortstate))->natts == 1) + if (outerTupDesc->natts == 1 && outerTupDesc->attrs[0].attbyval) sortstate->datumSort = true; else sortstate->datumSort = false;
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
I wrote: > and bisecting fingers this commit as the guilty party: > commit 91e9e89dccdfdf4216953d3d8f5515dcdef177fb > Author: David Rowley > Date: Thu Jul 22 14:03:19 2021 +1200 > Make nodeSort.c use Datum sorts for single column sorts After looking at that for a little while, I wonder if we shouldn't fix this by restricting the Datum-sort path to be used only with pass-by-value data types. That'd require only a minor addition to the new logic in ExecInitSort. The alternative of inserting a pfree of the old value would complicate the code nontrivially, I think, and really it would necessitate a complete performance re-test. I'm wondering if the claimed speedup for pass-by-ref types wasn't fictional and based on skipping the required pfrees. Besides, if you think this code is hot enough that you don't want to add a test-and-branch per tuple (a claim I also doubt, BTW) then you probably don't want to add such overhead into the pass-by-value case where the speedup is clear. regards, tom lane
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
=?UTF-8?B?w5ZuZGVyIEthbGFjxLE=?= writes: > With PG 15 (rc1 or beta4), I'm observing an interesting memory pattern. Yup, that is a leak. valgrind'ing it blames this call chain: ==00:00:16:12.228 4011013== 790,404,056 bytes in 60,800,312 blocks are definitely lost in loss record 1,108 of 1,108 ==00:00:16:12.228 4011013==at 0x9A5104: palloc (mcxt.c:1170) ==00:00:16:12.228 4011013==by 0x89F8D9: datumCopy (datum.c:175) ==00:00:16:12.228 4011013==by 0x9B5BEE: tuplesort_getdatum (tuplesortvariants.c:882) ==00:00:16:12.228 4011013==by 0x6FA8B3: ExecSort (nodeSort.c:200) ==00:00:16:12.228 4011013==by 0x6F1E87: ExecProcNode (executor.h:259) ==00:00:16:12.228 4011013==by 0x6F1E87: ExecMergeJoin (nodeMergejoin.c:871) ==00:00:16:12.228 4011013==by 0x6D7800: ExecProcNode (executor.h:259) ==00:00:16:12.228 4011013==by 0x6D7800: fetch_input_tuple (nodeAgg.c:562) ==00:00:16:12.228 4011013==by 0x6DAE2E: agg_retrieve_direct (nodeAgg.c:2454) ==00:00:16:12.228 4011013==by 0x6DAE2E: ExecAgg (nodeAgg.c:2174) ==00:00:16:12.228 4011013==by 0x6C6122: ExecProcNode (executor.h:259) ==00:00:16:12.228 4011013==by 0x6C6122: ExecutePlan (execMain.c:1636) and bisecting fingers this commit as the guilty party: commit 91e9e89dccdfdf4216953d3d8f5515dcdef177fb Author: David Rowley Date: Thu Jul 22 14:03:19 2021 +1200 Make nodeSort.c use Datum sorts for single column sorts Looks like that forgot that tuplesort_getdatum()'s result has to be freed by the caller. regards, tom lane
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Em qua., 28 de set. de 2022 às 14:24, Önder Kalacı escreveu: > Hi, > > Thanks for replying so quickly! > > I run your test here with a fix attached. >> >> Can you retake your test with the patch attached? >> >> >> Unfortunately, with the patch, I still see the memory usage increase and > get the OOMs > Thanks for sharing the result. regards, Ranier Vilela
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Hi, Thanks for replying so quickly! I run your test here with a fix attached. > > Can you retake your test with the patch attached? > > > Unfortunately, with the patch, I still see the memory usage increase and get the OOMs Thanks, Onder KALACI