Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-29 Thread Önder Kalacı
Hi David, Tom, all,


> I've just pushed the disable byref Datums patch I posted earlier. I
> only made a small adjustment to make use of the TupleDescAttr() macro.
> Önder, thank you for the report.
>
>
With this commit, I re-run the query patterns where we observed the
problem, all looks good now. Wanted to share this information as fyi.

Thanks for the quick turnaround!

Onder KALACI


Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 7:10 AM Tom Lane  wrote:
> TBH, I think this is completely ridiculous over-optimization.
> There's exactly zero evidence that a second copy of the function
> would improve performance, or do anything but contribute to code
> bloat (which does have a distributed performance cost).

I thought that that was unjustified myself.

-- 
Peter Geoghegan




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-29 Thread Ronan Dunklau
Le jeudi 29 septembre 2022, 16:10:03 CEST Tom Lane a écrit :
> Ronan Dunklau  writes:
> >> Yeah, I think the same rules around scope apply as
> >> tuplesort_gettupleslot() with copy==false.  We could do it by adding a
> >> copy flag to the existing function, but I'd rather not add the
> >> branching to that function. It's probably just better to duplicate it
> >> and adjust.
> > 
> > For the record, I tried to see if gcc would optimize the function by
> > generating two different versions when copy is true or false, thus getting 
rid
> > of the branching while still having only one function to deal with.
> 
> TBH, I think this is completely ridiculous over-optimization.
> There's exactly zero evidence that a second copy of the function
> would improve performance, or do anything but contribute to code
> bloat (which does have a distributed performance cost).

I wasn't commenting on the merit of the optimization, but just that I tried to 
get gcc to apply it itself, which it doesn't.

Regards,

-- 
Ronan Dunklau






Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-29 Thread Tom Lane
Ronan Dunklau  writes:
>> Yeah, I think the same rules around scope apply as
>> tuplesort_gettupleslot() with copy==false.  We could do it by adding a
>> copy flag to the existing function, but I'd rather not add the
>> branching to that function. It's probably just better to duplicate it
>> and adjust.

> For the record, I tried to see if gcc would optimize the function by 
> generating two different versions when copy is true or false, thus getting 
> rid 
> of the branching while still having only one function to deal with.

TBH, I think this is completely ridiculous over-optimization.
There's exactly zero evidence that a second copy of the function
would improve performance, or do anything but contribute to code
bloat (which does have a distributed performance cost).

regards, tom lane




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-29 Thread Ronan Dunklau
> I've just pushed the disable byref Datums patch I posted earlier. I
> only made a small adjustment to make use of the TupleDescAttr() macro.
> Önder, thank you for the report.

Thank you David for taking care of this.

> Yeah, I think the same rules around scope apply as
> tuplesort_gettupleslot() with copy==false.  We could do it by adding a
> copy flag to the existing function, but I'd rather not add the
> branching to that function. It's probably just better to duplicate it
> and adjust.
> 

For the record, I tried to see if gcc would optimize the function by 
generating two different versions when copy is true or false, thus getting rid 
of the branching while still having only one function to deal with. Using the 
-fipa-cp-clone (or even the whole set of additional flags coming with -O3), it 
does generate a special-case version of the function, but it seems to then 
only be used by heapam_index_validate_scan and 
percentile_cont_multi_final_common. This is from my investigation looking for 
references to the specialized version in the DWARF debug information.

Regards,

-- 
Ronan Dunklau






Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 9:59 PM David Rowley  wrote:
> select b from t1 order by b offset 100;
>
> Master:
> latency average = 344.763 ms
>
> Patched:
> latency average = 268.374 ms
>
> about 28% faster.

That's more like it!

-- 
Peter Geoghegan




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread David Rowley
On Thu, 29 Sept 2022 at 14:32, Peter Geoghegan  wrote:
>
> On Wed, Sep 28, 2022 at 6:13 PM David Rowley  wrote:
> > Master:
> > latency average = 313.197 ms
> >
> > Patched:
> > latency average = 304.335 ms
> >
> > So not a very impressive speedup there (about 3%)
>
> Worth a try, at least. Having a more consistent interface is valuable
> in itself too.

Just testing the datum sort in nodeSort.c with the same table as
before but using the query:

select b from t1 order by b offset 100;

Master:
latency average = 344.763 ms

Patched:
latency average = 268.374 ms

about 28% faster.

I'll take this to another thread and put it in the next CF

David




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 6:13 PM David Rowley  wrote:
> Master:
> latency average = 313.197 ms
>
> Patched:
> latency average = 304.335 ms
>
> So not a very impressive speedup there (about 3%)

Worth a try, at least. Having a more consistent interface is valuable
in itself too.

-- 
Peter Geoghegan




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread David Rowley
On Thu, 29 Sept 2022 at 12:07, Peter Geoghegan  wrote:
> Also potentially relevant: the 2017 commit fa117ee4 anticipated adding
> a "copy" argument to tuplesort_getdatum() (the same commit added such
> a "copy" argument to tuplesort_gettupleslot()). I see that that still
> hasn't happened to tuplesort_getdatum() all these years later. Might
> be a good idea to do it in the next year or two, though.
>
> If David is interested in pursuing this now then I certainly won't object.

Just while this is fresh in my head, I wrote some code to make this
happen.  My preference would be not to add the "copy" param to the
existing function and instead just add a new function to prevent
additional branching.

The attached puts back the datum sort in nodeSort.c for byref types
and adjusts process_ordered_aggregate_single() to make use of this
function.

I did a quick benchmark to see if this help DISTINCT aggregate any:

create table t1 (a varchar(32) not null, b varchar(32) not null);
insert into t1 select md5((x%10)::text),md5((x%10)::text) from
generate_Series(1,100)x;
vacuum freeze t1;
create index on t1(a);

With a work_mem of 256MBs I get:

query = select max(distinct a), max(distinct b) from t1;

Master:
latency average = 313.197 ms

Patched:
latency average = 304.335 ms

So not a very impressive speedup there (about 3%)

Some excerpts from perf top show:

Master:
   1.40%  postgres  [.] palloc
   1.13%  postgres  [.] tuplesort_getdatum
   0.77%  postgres  [.] datumCopy

Patched:
   0.91%  postgres  [.] tuplesort_getdatum_nocopy
   0.65%  postgres  [.] palloc

I stared for a while at the mode_final() function and thought maybe we
could use the nocopy variant there. I just didn't quite pluck up the
motivation to write any code to see if it could be made faster.

David
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index fe74e49814..b5f63b5a2b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -878,8 +878,8 @@ process_ordered_aggregate_single(AggState *aggstate,
 * pfree them when they are no longer needed.
 */
 
-   while (tuplesort_getdatum(pertrans->sortstates[aggstate->current_set],
- true, newVal, isNull, 
))
+   while 
(tuplesort_getdatum_nocopy(pertrans->sortstates[aggstate->current_set],
+true, 
newVal, isNull, ))
{
/*
 * Clear and select the working context for evaluation of the 
equality
@@ -900,24 +900,33 @@ process_ordered_aggregate_single(AggState *aggstate,

 pertrans->aggCollation,

 oldVal, *newVal)
{
-   /* equal to prior, so forget this one */
-   if (!pertrans->inputtypeByVal && !*isNull)
-   pfree(DatumGetPointer(*newVal));
+   MemoryContextSwitchTo(oldContext);
+   continue;
}
else
{
advance_transition_function(aggstate, pertrans, 
pergroupstate);
-   /* forget the old value, if any */
-   if (!oldIsNull && !pertrans->inputtypeByVal)
-   pfree(DatumGetPointer(oldVal));
-   /* and remember the new one for subsequent equality 
checks */
-   oldVal = *newVal;
+
+   MemoryContextSwitchTo(oldContext);
+
+   /*
+* Forget the old value, if any and remember the new 
one for
+* subsequent equality checks.
+*/
+   if (!pertrans->inputtypeByVal)
+   {
+   if (!oldIsNull)
+   pfree(DatumGetPointer(oldVal));
+   if (!*isNull)
+   oldVal = datumCopy(*newVal, 
pertrans->inputtypeByVal,
+  
pertrans->inputtypeLen);
+   }
+   else
+   oldVal = *newVal;
oldAbbrevVal = newAbbrevVal;
oldIsNull = *isNull;
haveOldVal = true;
}
-
-   MemoryContextSwitchTo(oldContext);
}
 
if (!oldIsNull && !pertrans->inputtypeByVal)
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 37ad35704b..f418be30a1 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -197,8 +197,8 @@ 

Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Michael Paquier
On Wed, Sep 28, 2022 at 07:35:07PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
>> Wouldn't it be better to have 3a58176 reflect the non-optimization
>> path in the EXPLAIN output of a new regression test if none of the
>> existing tests are able to show any difference?
> 
> This decision is not visible in EXPLAIN in any case.

Okay, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread David Rowley
On Thu, 29 Sept 2022 at 12:30, Michael Paquier  wrote:
>
> On Thu, Sep 29, 2022 at 11:58:17AM +1300, David Rowley wrote:
> > I've just pushed the disable byref Datums patch I posted earlier. I
> > only made a small adjustment to make use of the TupleDescAttr() macro.
> > Önder, thank you for the report.
>
> Wouldn't it be better to have 3a58176 reflect the non-optimization
> path in the EXPLAIN output of a new regression test if none of the
> existing tests are able to show any difference?

There's nothing in EXPLAIN that shows that this optimization occurs.
Or, are you proposing that you think there should be something? and
for 15??

David




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Tom Lane
Michael Paquier  writes:
> Wouldn't it be better to have 3a58176 reflect the non-optimization
> path in the EXPLAIN output of a new regression test if none of the
> existing tests are able to show any difference?

This decision is not visible in EXPLAIN in any case.

regards, tom lane




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Michael Paquier
On Thu, Sep 29, 2022 at 11:58:17AM +1300, David Rowley wrote:
> I've just pushed the disable byref Datums patch I posted earlier. I
> only made a small adjustment to make use of the TupleDescAttr() macro.
> Önder, thank you for the report.

Wouldn't it be better to have 3a58176 reflect the non-optimization
path in the EXPLAIN output of a new regression test if none of the
existing tests are able to show any difference?
--
Michael


signature.asc
Description: PGP signature


Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 4:00 PM Peter Geoghegan  wrote:
> I am reminded of the discussion that led to bugfix commit c2d4eb1b
> some years back.

Also potentially relevant: the 2017 commit fa117ee4 anticipated adding
a "copy" argument to tuplesort_getdatum() (the same commit added such
a "copy" argument to tuplesort_gettupleslot()). I see that that still
hasn't happened to tuplesort_getdatum() all these years later. Might
be a good idea to do it in the next year or two, though.

If David is interested in pursuing this now then I certainly won't object.

-- 
Peter Geoghegan




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 12:57 PM Tom Lane  wrote:
> David Rowley  writes:
> > I'm wondering if the best way to fix it if doing it that way would be
> > to invent tuplesort_getdatum_nocopy() which would be the same as
> > tuplesort_getdatum() except it wouldn't do the datumCopy for byref
> > types.
>
> Yeah, perhaps.  We'd need a clear spec on how long the Datum could
> be presumed good --- probably till the next tuplesort_getdatum_nocopy
> call, but that'd need to be checked --- and then check if that is
> satisfactory for nodeSort's purposes.

I am reminded of the discussion that led to bugfix commit c2d4eb1b
some years back.

As the commit message of that old bugfix notes, tuplesort_getdatum()
and tuplesort_gettupleslot() are "the odd ones out" among "get tuple"
routines (i.e. routines that get a tuple from a tuplesort by calling
tuplesort_gettuple_common()). We used to sometimes do that with
tuplesort_getindextuple() and possible other such routines, but the
need for that capability was eliminated on the caller side around the
same time as the bugfix went in.

-- 
Peter Geoghegan




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread David Rowley
On Thu, 29 Sept 2022 at 08:57, Tom Lane  wrote:
>
> David Rowley  writes:
> > I'm wondering if the best way to fix it if doing it that way would be
> > to invent tuplesort_getdatum_nocopy() which would be the same as
> > tuplesort_getdatum() except it wouldn't do the datumCopy for byref
> > types.
>
> Yeah, perhaps.  We'd need a clear spec on how long the Datum could
> be presumed good --- probably till the next tuplesort_getdatum_nocopy
> call, but that'd need to be checked --- and then check if that is
> satisfactory for nodeSort's purposes.

Yeah, I think the same rules around scope apply as
tuplesort_gettupleslot() with copy==false.  We could do it by adding a
copy flag to the existing function, but I'd rather not add the
branching to that function. It's probably just better to duplicate it
and adjust.

> If we had such a thing, I wonder if any of the other existing
> tuplesort_getdatum callers would be happier with that.  nodeAgg for
> one is tediously freeing the result, but could we drop that logic?

Looking at process_ordered_aggregate_single(), it's likely more
efficient to use the nocopy version and just perform a datumCopy()
when we need to store the oldVal.  At least, that would be more
efficient when many values are being skipped due to being the same as
the last one.

I've just pushed the disable byref Datums patch I posted earlier. I
only made a small adjustment to make use of the TupleDescAttr() macro.
Önder, thank you for the report.

David




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Tom Lane
David Rowley  writes:
> I'm wondering if the best way to fix it if doing it that way would be
> to invent tuplesort_getdatum_nocopy() which would be the same as
> tuplesort_getdatum() except it wouldn't do the datumCopy for byref
> types.

Yeah, perhaps.  We'd need a clear spec on how long the Datum could
be presumed good --- probably till the next tuplesort_getdatum_nocopy
call, but that'd need to be checked --- and then check if that is
satisfactory for nodeSort's purposes.

If we had such a thing, I wonder if any of the other existing
tuplesort_getdatum callers would be happier with that.  nodeAgg for
one is tediously freeing the result, but could we drop that logic?
(I hasten to add that I'm not proposing we touch that for v15.)

regards, tom lane




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread David Rowley
Thanks for investigating this and finding the guilty commit.

On Thu, 29 Sept 2022 at 07:34, Tom Lane  wrote:
> After looking at that for a little while, I wonder if we shouldn't
> fix this by restricting the Datum-sort path to be used only with
> pass-by-value data types.  That'd require only a minor addition
> to the new logic in ExecInitSort.

I'm also wondering if that's the best fix given the timing of this discovery.

> The alternative of inserting a pfree of the old value would complicate
> the code nontrivially, I think, and really it would necessitate a
> complete performance re-test.  I'm wondering if the claimed speedup
> for pass-by-ref types wasn't fictional and based on skipping the
> required pfrees.  Besides, if you think this code is hot enough that
> you don't want to add a test-and-branch per tuple (a claim I also
> doubt, BTW) then you probably don't want to add such overhead into
> the pass-by-value case where the speedup is clear.

I'm wondering if the best way to fix it if doing it that way would be
to invent tuplesort_getdatum_nocopy() which would be the same as
tuplesort_getdatum() except it wouldn't do the datumCopy for byref
types.  It looks like tuplesort_gettupleslot() when copy==false just
directly stores the MinimalTuple that's in stup.tuple and shouldFree
is set to false.

Going by [1], it looks like I saw gains in test 6, which was a byref
Datum. Skipping the datumCopy() I imagine could only make the gains
slightly higher on that.  That puts me a bit more on the fence about
the best fix for PG15.

I've attached a patch to restrict the optimisation to byval types in
the meantime.

David

[1] 
https://www.postgresql.org/message-id/CAApHDvrWV%3Dv0qKsC9_BHqhCn9TusrNvCaZDz77StCO--fmgbKA%40mail.gmail.com
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 3c28d60c3e..740ad37717 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -220,6 +220,7 @@ SortState *
 ExecInitSort(Sort *node, EState *estate, int eflags)
 {
SortState  *sortstate;
+   TupleDesc   outerTupDesc;
 
SO1_printf("ExecInitSort: %s\n",
   "initializing sort node");
@@ -274,11 +275,13 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
ExecInitResultTupleSlotTL(>ss.ps, );
sortstate->ss.ps.ps_ProjInfo = NULL;
 
+   outerTupDesc = ExecGetResultType(outerPlanState(sortstate));
+
/*
-* We perform a Datum sort when we're sorting just a single column,
+* We perform a Datum sort when we're sorting just a single byval 
column,
 * otherwise we perform a tuple sort.
 */
-   if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
+   if (outerTupDesc->natts == 1 && outerTupDesc->attrs[0].attbyval)
sortstate->datumSort = true;
else
sortstate->datumSort = false;


Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Tom Lane
I wrote:
> and bisecting fingers this commit as the guilty party:

> commit 91e9e89dccdfdf4216953d3d8f5515dcdef177fb
> Author: David Rowley 
> Date:   Thu Jul 22 14:03:19 2021 +1200

> Make nodeSort.c use Datum sorts for single column sorts

After looking at that for a little while, I wonder if we shouldn't
fix this by restricting the Datum-sort path to be used only with
pass-by-value data types.  That'd require only a minor addition
to the new logic in ExecInitSort.

The alternative of inserting a pfree of the old value would complicate
the code nontrivially, I think, and really it would necessitate a
complete performance re-test.  I'm wondering if the claimed speedup
for pass-by-ref types wasn't fictional and based on skipping the
required pfrees.  Besides, if you think this code is hot enough that
you don't want to add a test-and-branch per tuple (a claim I also
doubt, BTW) then you probably don't want to add such overhead into
the pass-by-value case where the speedup is clear.

regards, tom lane




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Tom Lane
=?UTF-8?B?w5ZuZGVyIEthbGFjxLE=?=  writes:
> With PG 15 (rc1 or beta4), I'm observing an interesting memory pattern.

Yup, that is a leak.  valgrind'ing it blames this call chain:

==00:00:16:12.228 4011013== 790,404,056 bytes in 60,800,312 blocks are 
definitely lost in loss record 1,108 of 1,108
==00:00:16:12.228 4011013==at 0x9A5104: palloc (mcxt.c:1170)
==00:00:16:12.228 4011013==by 0x89F8D9: datumCopy (datum.c:175)
==00:00:16:12.228 4011013==by 0x9B5BEE: tuplesort_getdatum 
(tuplesortvariants.c:882)
==00:00:16:12.228 4011013==by 0x6FA8B3: ExecSort (nodeSort.c:200)
==00:00:16:12.228 4011013==by 0x6F1E87: ExecProcNode (executor.h:259)
==00:00:16:12.228 4011013==by 0x6F1E87: ExecMergeJoin (nodeMergejoin.c:871)
==00:00:16:12.228 4011013==by 0x6D7800: ExecProcNode (executor.h:259)
==00:00:16:12.228 4011013==by 0x6D7800: fetch_input_tuple (nodeAgg.c:562)
==00:00:16:12.228 4011013==by 0x6DAE2E: agg_retrieve_direct (nodeAgg.c:2454)
==00:00:16:12.228 4011013==by 0x6DAE2E: ExecAgg (nodeAgg.c:2174)
==00:00:16:12.228 4011013==by 0x6C6122: ExecProcNode (executor.h:259)
==00:00:16:12.228 4011013==by 0x6C6122: ExecutePlan (execMain.c:1636)

and bisecting fingers this commit as the guilty party:

commit 91e9e89dccdfdf4216953d3d8f5515dcdef177fb
Author: David Rowley 
Date:   Thu Jul 22 14:03:19 2021 +1200

Make nodeSort.c use Datum sorts for single column sorts

Looks like that forgot that tuplesort_getdatum()'s result has to
be freed by the caller.

regards, tom lane




Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Ranier Vilela
Em qua., 28 de set. de 2022 às 14:24, Önder Kalacı 
escreveu:

> Hi,
>
> Thanks for replying so quickly!
>
> I run your test here with a fix attached.
>>
>> Can you retake your test with the patch attached?
>>
>>
>> Unfortunately, with the patch, I still see the memory usage increase and
> get the OOMs
>
Thanks for sharing the result.

regards,
Ranier Vilela


Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

2022-09-28 Thread Önder Kalacı
Hi,

Thanks for replying so quickly!

I run your test here with a fix attached.
>
> Can you retake your test with the patch attached?
>
>
> Unfortunately, with the patch, I still see the memory usage increase and
get the OOMs

Thanks,
Onder KALACI